REPOR' 


AD-A277  605 


’ovoci  for  until  i  c  rel 

!  J  V:  V 


l  AGENCY  USE  ONLY  (ieive  0/417*7 


.  -  .  T  TYPE  AN  1  OATES  COVERED  / 

j  AftwuAL  01  Nov  92  TO  31  Oct  93  / 


a  title  and  subtitle  s  funding  numbers 

PSYCHOPHYSICS  OF  COMPLEX  AUDITORY  AND  SPEECH  STIMULI  F49620-93- 1 -0033 


6  AUTHOR(S) 

Dr  Richard  E.  Pastore 


61102F 

2313 

AS 


7  PERFORMING  ORGANIZATION  NAME(S)  ANO  AOORESS(ES) 
Dept  of  Psychology 
State  University  of  New  York 
P.0.  Box  6000 

Binghamton,  NY’  13902-6000 


=  FV.':FM'NG  ORGANIZATION 
‘.{■'CRT  NUMBER 


AEOSR-TR-  9  4  010  8 


9  SPONSORING  MONITORING  AGENCY  NAME(S)  ANO  AOORESS(ES) 
AFOSR/NL 

110  Duncan  Avenue,  Suite  B 1 1 5 
Bolling  AFB  DC  20332-0001  — -  1 

Dr  John  F.  Tangeny 


ii  supplementary  notes 


O  STRlBUTlON.  AVAILABILITY  STATEMENT  \ 


*C  S P C \ S O -  SC  V.ONlTCRiNG 
AGENCY  F  :  POA’  NUMBFR 


T\C 


Approved  for  public  release; 
distribution  unlimited 


titftt'.lnB 

gUsitM)  AH  STIC  r»prod«o*» 

a«K>  till  M  to  flat.  . 


ABSTRACT  ;M3f:mom  200  words) 

A  major  focus  on  the  primary  project  is  to  use  of  different  procedures  to  provide 
converging  evidence  on  the  nature  of  perceptual  spaces  for  speech  categories. 
Completed  research  examined  initial  voiced  consonants,  with  results  providing 
strong  evidence  that  different  stimulus  properties  may  cue  a  phoneme  category 
in  different  vowel  contexts.  Thus,  /b/  is  cued  by  a  rising  second  formant  (F2) 
with  the  vowel  /a/,  requires  both  F2  and  F3  to  be  rising  with  /i /,  and  is  inde¬ 
pendent  o  the  release  burst  for  these  vowels.  Furthermore,  cues  for  phonetic 
contrasts  are  not  necessarily  symmetric,  and  the  strong  dependence  of  prior 
speech  research  on  classification  procedures  may  have  led  to  errors.  Thus, 
opposite  (falling  F2  &  F3)  transitions  lead  somewhat  ambiguous  percepts  (  .e., 
not  lb/)  which  may  be  labeled  consistently  (as  / d/  or  /g/),  but  requires  a 
release  burst  to  achieve  high  category  quality  and  similarity  to  category 
exemplars).  Ongoing  research  is  examining  cues  in  other  vowel  contexts,  and 
isusing  additional  procedures  to  evaluate  the  nature  of  interaction  between  euse 
for  categories  of  both  speech  and  music* 


:  :  lun  Cl  A  5  $,F  iCA  liGN 

|  is  security  classification 

19  SECURE*  CLAES  .at, cs 

r*\  ,  -  , 

.  •  *  «  i  ir  *  fi  * 

of  this  page 

CF  ABSTRACT 

(UL) 

(U) 

|  (U) 

(U) 

Psychoacoustics  & 
Auditory  Cognition 
Laboratory 

Department  of  Psychology 
SUNY  -  Binghamton 
Binghamton,  NY  13902 


1993  Annual  Progress  Report 


94-09338 


AFOSR  Grants 


F49609310033* 


Richard  E.  Pastore,  Project  Director 


94  3  25  056 


IW  Aiifiu.il  l'f|K>it  lor  AI-'OSR  (inmls  I’-PWWJllHtV)  anil  r-l'NjWJl  11777 
Organization  of  this  Report 

I  Ins  rep. II i  is  intended  to  provide  a  sampling  or  snap-shot  or  the  status  of  research  program  supported  by  the  Air  force  Office 
id  Scientific  Research  under  Grant  r.19609.11 IXI33  and  supplemental  AASPRT 

award  N%(>9.7|U7.27.  Instead  of  preparing  a  special  report  describing  each  of  the  facets  of  the  ongoing  and  completed  research,  this 
rc|s<rt  is  a  compilation  of  manuscripts  describing  the  various  major  facets  of  the  research.  "Hicsc  documents  have  a  general 
organization  based  upon  status  in  relation  to  publication  in  peer-reviewed  journals. 

I  he  first  section  of  this  report  contains  one  manuscript  which  currently  is  under  review. 

ITic  second  .section  contains  four  completed  manuscripts  which  are  aliout  to  lie  submitted  for  review.  ITiese  manuscripts  range 
in  completion  from  lieing  essentially  ready  to  lie  dropped  in  the  mail  fe.g..  (.‘ho  et  al.)  to  requiring  a  little  more  critical  reading  and 
fine  tuning  lieforc  submission  (c.g..  Hall  ct  al.). 

The  third  section  contains  detailed  reports  on  two  on-going  projects  which  arc  not  deserilxtd  in  any  of  the  other  four  sections  of 
this  report.  These  reports  provide  a  detailed  introduction  to  the  study,  summarize  the  results  obtained  to  date  and,  provide  a 
discussion  of  those  results.  These  reports  are  intended  to  lie  the  basis  of  suliseqiicnt  manuscripts  describing  the  results. 

ITic  final  section  of  this  report  contains  papers  or  posters  presented  at  professional  meetings  (Acoustical  Society  of  America  and 
I’sychonomic  Society),  where  the  research  topic  is  not  otherwise  covered  in  any  of  the  other  sections  of  this  report,  for  example, 
feature  Integration  Theory  ami  Illusory  Conjunctions  are  th.i  focus  of  completed  and  ongoing  research.  However,  since  a  manuscript 
or  detailed  report  is  al  least  a  month  away  from  reaching  a  stage  adequate  for  public  access.  Ihc  poster  has  licen  included.  As  a 
contrary  example,  the  contents  of  the  Acoustical  Society  papers  by  Hall  and  by  Chu  (Itsred  lx: low)  are  covered  thoroughly  in  the 
completed  manuscripts  which  these  individuals  as  first  authors.  Therefore,  no  added  effort  was  made  to  provide  a  transcription  of 
these  oral  presentation. 

Hie  manuscripts,  reports,  papers,  and  posters  (cited  alxjve)  descrilie  primarily  the  current  findings,  but  most  also  represent  the 
basic  subject  and  approach  to  continuing  research. 

Patent  Statement 

The  research  completed  to  date  has  not  resulted  in  any  findings  or  developments  which  are  appropriate  for  any  type  of  patent 
application,  and  no  such  application  has  lx:en  sought. 


Research  Overview 

I  lie  major  objectives  of  the  ongoing  research  projects  involvcd.the  delineation  of  the  nature  of  the  processes  which  determine 
the  perception  of  complex  acoustic  stimuli.  The  research  has  not  developed  any  new  techniques  for  investigating  perception,  but  is 
unique  in  utilizing  a  set  of  established  procedures  to  provide  a  comprehensive  picture  of  perception  and  converging  evidence  for  the 
nature  and  role  of  specific  cues  in  perception.  The  selection  of  stimulus  class  fe.g..  speech,  music,  tones)  for  a  given  set  of 
experiments  is  based  upon  lx:ing  able  to  most  effectively  address  a  given  critical  research  question. 

The  manuscripts  and  specific  reports  contained  in  this  document  attest  to  the  success  and  the  importance  of  not  only  Ihe 
completed  research,  but  also  the  on-going  projects.  "ITiese  statements  have  !>ecn  prepared  in  the  forms  appropriate  for  standard 
scientific  peer-review.  If  needed,  this  document,  or  the  next  annual  document,  can  lx:  modified  to  include  the  applied  implications  of 
the  research  findings. 


Research  Bibliography 

Manuscripts  L’ndcr  Review 

’la.  X-I"..  &  I’aslore,  R.T.  Perceptual  Constancy  of  a  Global  Spectral  Property:  Spectral  Slope  Discrimination.  Journal  of  the 
Acoustical  Society  of  America,  (accepted  pending  revisions). 


Manuscripts  about  to  be  submitted 

V’Ackcr.  H  I  .  Pasture.  R.K.,  &  Hall.  M.D..  Within-category  discrimination  of  musical  chords:  Perceptual  magnet  or  anrhor? 
Pereention  &  Psychophysics. 

1  Clio.  J.  I ...  Hall.  M.D..  &  Pastore,  R.P.  Normalization  of  musical  instrument  timbre.  Journal  of  experimental  Psychology: 
Human  Perception  <Vt  Performance. 

1  Hall.  M.D..  A  Pastore.  K.I  ..  effects  of  stimulus  complexity  on  the  perceptual  organization  of  musical  tones.  Journal  of 
Pxpcrinient.il  Psychology:  Human  Pereention  A  Performance. 

1  Huang.  W„  Hull.  M.D..  A  Pastore.  R.M.  Mapping  percepts  in  Ihc  major  variant  or  Ihc  wlave  illusion.  Pereention  A 

Psychophysics. 


1  ’ atxi  Presentations 

'•'Hall.  M.D..  A  Pasture.  R  I  (!‘W)  An  Auditory  Analogue  to  feature  Integration  Psychonomic  Society.  Washington.  DC 
V»v.  4.  p)*)l  [poster  Presentation) 

1  I’dMnrc,  H  I  .  (I'WI)  Implicit  assumptions  in  me  Hiding  higher  level  auditory  priKesses  Journal  of  the  Acoustical  Society  of 
America.  9.1.  2.11)7.  (Abstract  of  Invited  paper  | 

Huang.  W  .  Hall,  M  I).,  At  Pasture.  K  I  (PJ‘1.1)  An  illusion  based  on  dichotir  fusion  of  harmonically  related  tones  Journal  of 
the  Acoustical  Sexictv  of  America.  *M.  2116  [Altsiraet  of  Poster) 

J  ("ho,  J.l...  Hall.  Ml).  A:  Past  ore.  HI  .  tlWl)  Stimulus  properties  entieal  to  normali/ation  of  instrument  timbre.  Journal  of 
the  AcotisUral  .Sexictv  of  America.  ‘H.  24U2.  |  Abstract  of  Paper) 

'  1  i.  X-l\  A:  (’ho.  J..  (l')'Mj  An  exploration  o|  phoneme  structure  and  nvxlcls  of  elassification  for  place  ol  articulation  Journal 
of  the  Acoustical  Society  of  America.  *)1.  2V)IJ  j  Abstract  of  Paper) 


Detailed  Reports  on  Work  in  Progress  (when  not  summarized  abovej 

Pasture,  R  I  ..  farrington.  S  ..  A  Jassal.  S.  Measuring  the  Df  for  identification  of  order  of  onset  for  complex  auditory  stimuli 

Pasture.  H  I  Acker.  H.A..  (  ho.  .1  I  i.  \-l  .  A  farrington.  S  I  xploralion  of  the  perceptual  structure  of  cues  for  place  of 
articulation 


Manuscript  included  m  this  report. 

'  Paper  presentation  NO'I  im luded  m  this  report.  More  complete  description  of  work  included  in  tins  report 
Major  aspects  of  this  research  is  continuing 


Research  Stall 


faculty 

Richard  I  Pasture.  PhD 

(iraduaic  Students 

Xtaolcng  ( Sheldon  \  l  i.  Ph  D 
Michael  Hall.  M A 
Win vi  Huang  M\ 

Jennifer  Cho.  M.\ 
liarbara  Acker.  HA* 

Shannon  l  urrington.  IV\ 
Sa|ni  Jassal* 


l  ndergraduun.  Students 
Denise  Kolavera 
I  aura  I’evsei 
1  |U  n  H<  ill  man 
Shannon  I  aniiigton 


Sheldon  I  i  received  Ins  Ph  D  limn  mu  program,  but  has  <ontimicd  to  work  on  pro  jet  ts  m  lalx>ralorv  Dr  I  i  recent  l\  complele<1 
post-doctoral  work  with  Dr  (  h.irles  Watson  at  Indiana  l  niversiiv.  He  now  is  with  the  Department  ot  I  leetnc.tl  and  Computer 
I  ngincerwg.  I  he  Johns  Hopkins  I  murstlv 

‘  Jennifer  (ho  currently  is  a  part-time  student  while  working  as  a  co-op  (intern)  on  a  \a\v  project  wtih  IBM-Owego  (1  or.»l-(  )wc  geo 
Barbara  Acker  is  supported  l*v  AASl  K  I  award  She  will  complete  her  MA  m  l‘)').J 
4  Shannon  I  arnngton  worked  mt  the  pn  »jec  |  both  as  a  Si  N't  .(  oriland  undergraduate  and  ,is  a  Binghamton  graduate  student 


Sajm  Jassal.  a  graduate  strident  in  .mother  ialKtr.itoiv  worked  on  (his  project  over  the  Summer  ot  I't'D 


;u 


Spectral  Slope  Discrimination 


Pagpl 


Perceptual  Constancy  of  a  Global  Spectral  Property. 
Spectral  Slope  Discrimination 
Xiaofeng  Li*  and  Richard  E  Pastorc 

Department  of  Psychology 
and 

Center  for  Cognitive  and  Psycholinguists  Sciences 
State  University  of  New  York  at  Binghamton 


Running  head:  SPECTRAL  SLOPE  DISCRIMINATION 


*  Currently  affiliated  with  the  Center  for  Speech  Processing  Department  of  Electrical  and  Computer  Engineering  The  Johns  Hopkins 
University. 


Abstract 

The  current  study  investigated  the  perceptual  constancy  of  spectral  slope  discrimination  when  the  fundamental  frequency  and  spectral 
shape  of  the  stimuli  were  varied  across  to  be  discriminated  stimuli  on  a  single  trial.  The  three  stimulus  variables,  all  of  which  were 
global  or  emergent  properties  of  a  complex  sound,  represented  two  sound  source  properties  and  a  filter  property  A  stimulus  was 
synthesized  by  passing  a  source  spectrum  through  a  filter  transfer  function  according  to  the  source-filter  model  of  complex  sound 
production  Four  experiments  were  conducted  in  this  study.  Experiment  1  examined  the  effect  of  the  difference  in  overall  stimulus 
level  on  spectral  slope  discrimination  Experiments  2  and  3  investigated,  respectively,  the  effects  of  variations  in  the  fundamental 
frequency  and  a  filter  property  on  spectral  slope  discrimination  Experiment  4  was  designed  to  resolve  two  issues  raised  in  the 
preceding  experiments  The  current  study  showed  a  significant  performance  decrement  in  spectral  slope  discrimination  when  a  second 
source  property-fundamental  frequency- was  varied  However;  little  detrimental  effect  was  observed  when  the  filter  property— spectral 
shape— was  varied  The  study  supported  claims  that  listeners  treat  source  properties  as  a  unit  which  is  relatively  independent  of  filter 
properties 


PACS  numbers:  43.66.  Jh,  43.66.  Lj 


oxA,ma,l  of  the 
coixsticcxl  Society  of 
merico 

[accepted  pending  revslon] 


Aoossslon  Vor  j 

NT1  £  ::>*! 

X  i T  *•  •> 

□ 

□ 

-or. 

L  S’.  8’  rlV 

Av’i  U  .-i*" . V  [7  vjjdes 

e;.J, 


Spectra!  Slope  !>iscrimination 


Page  2 


One  goal  of  psychoacoustics  research  is  to  evaluate  the  important  logical  possibility  that  principles  discovered  and  results 
obtained  from  studies  using  simple  stimuli  (eg,  pure  tones  and  noise  bursts)  can  be  extended  to  explain  speech  perception 
Unfortunately;  evidence  has  been  accumulated  over  years  indicating  tha;  other  than  some  very  general  findings  (  eg,  simultaneous 
masking,  such  efforts  typically  are  not  very  successful ( Pastore,  1981;  Watson,  1991;  Watson,  Qiu,  Chamberlain,  &  Li,  1993).  For 
example,  Christopherson  and  Humes  (1992)  examined  the  relationship  between  listeners’  abilities  to  process  a  wide  variety  of  simple 
auditory  stimuli  and  the  abilities  to  identify  and  to  discriminate  speech  sounds  Although  the  abilities  to  process  a  battery  of  the 
simple  stimuli  were  found  to  be  highly  correlated  among  themselves,  these  abilities  did  not  predict  performance  in  speech  perception 

The  failure  of  past  psychoacoustics  research  to  predict  performance  in  speech  perception  tasks  may  be  in  part  due  to  its  use 
of  very  simple  stimuli  and  concentration  on  the  study  of  the  processing  of  fine,  rather  than  global  structure  of  complex  auditory 
stimuli  It  is  quite  possible  that  the  strategy  used  to  selectively  listen  to  acoustic  details  differs  from  that  used  to  listen  to  global 
aspects  of  speech  and  other  complex  nonspeech  sounds  To  detect  details  of  complex  auditory  stimuli,  listeners  are  instructed  to  focus 
attention  on  very  specific  stimulus  properties  Results  from  experimei.ts,  using  such  procedures  and  designed  to  evaluate  the  limits  on 
sensory  processing  obviously  have  made  significant  contributions  to  our  understanding  of  the  physiological  mechanism  of  the  cochlear 
in  processing  frequencies  and  intensities  However;  to  understand  speech,  the  auditory  system  may  not  have  to  resolve  all  details  in  a 
speech  signal;  in  fact,  such  detailed,  focused  processing  might  hinder  more  integrative,  global  processing  Instead,  the  most  relevant 
stimulus  properties  and  the  redundancy  of  various  cues  that  exist  in  speech  sounds  may  require  listeners  to  capture  global  properties  of 
the  sounds  Because,  in  a  normal  listening  environment,  the  fine  structure  of  speech  sounds  is  seldom  modified  by  various 
environmental  factors  or  by  mixing  with  other  sounds  ( including  different  speech  sounds),  such  "global  structure"  listening  strategy 
seems  to  be  more  ecologically  valid  than  the  “fine  structure"  listening  strategy. 

In  contrast  to  earlier  work,  recent  psychoacoustics  research  has  reported  that  listeners  can  use  broad  frequency  ranges  of 
information  even  when  asked  to  detect  a  change  in  a  local  frequency  component  Examples  of  such  research  include  studies  of  profile 
analysis  (eg,  Green,  1988).  comodulation  masking  release  fee.  Hall,  et  aL,  1984).  comodulation  detection  difference  (  McFadden, 
1987;  Wright,  1990).  modulation  detection  interference ( Ycst  &  Sheft;1989;  Yost,  et  al,  1989),  and  eorrciational  listening! Cohen  & 
Schubert,  1987).  These  different  types  of  research  infers  global  processing  based  upon  changes  in  the  detection  or  the  discrimination 
of  frequency  components  The  current  study  differs  from  these  psychoacoustics  research  efforts  in  that  it  directly  investigates  listeners' 
abilities  to  discriminate  a  global  spectral  property  that  has  been  identified  as  a  major  factor  in  determining  speech  quality. 

The  global  property  investigated  in  the  current  study  is  the  spectral  slope  of  complex  stimuli,  with  the  stimuli  synthesized 
with  a  simple  harmonic  structure  on  the  basis  of  the  source-filter  model  of  speech  production  (  Fant,  1960).  Like  speech,  a  stimulus  is 
produced  by  ronvoluting  a  source  spectrum  with  a  filter  transfer  function.  In  the  current  study,  the  source  spectrum  is  composed  of 
the  first  20  harmonic  frequencies  of  a  given  fundamental  frequency,  with  the  intensities  specified  by  a  decreasing  spectral  envelope 
Besides  varying  in  the  slope  of  the  spectral  envelope  ( spectral  slope),  the  stimuli  also  differ  in  terms  of  a  number  of  other  properties, 
including  the  fundamental  frequency  ( typically  another  property  of  the  sound  source)  and  the  number  of  spectral  peaks  ( typically  a 
filter  or  resonator;  rather  than  source,  property).  Listeners  are  asked  to  discriminate  spectral  slope  while  ignoring  irrelevant  variation 
in  the  fundamental  frequencies,  spectral  peaks,  and  overall  stimulus  intensity.  The  current  study  thus  examines  the  perceptual 
invariance  of  spectral  slope  in  the  context  of  variation  in  frequency  composition  ( fundamental  frequency)  and  in  spectral  shape 
( spectral  peaks). 

The  choice  of  the  three  variables  ( fundamental  frequency,  resonator  characteristic,  and  spectral  slope)  is  motivated  by  the 
resemblance  of  these  properties  to  important  aspects  of  speech  sounds.  The  manipulation  of  these  three  stimulus  variables 
(dimensions)  thus  captures  important  variations  observed  in  speech  sounds  Speech  produced  by  male  and  female  speakers  differs 
largely  in  fundamental  frequency.  The  pattern  of  spectral  peaks  defines  formant  structure,  and  perceptually  differentiates  linguistic 
categories  of  speech  sounds  (eg,  vowels).  Spectral  slope,  on  the  other  hand,  determines  various  para  linguistic  characteristics  of  a 
speaker(eg,  phonation,  voicing  style,  etc,  Monscn  &  Fngcbrctson,  1977;  also  summarized  in  Stevens,  1989).  For  example,  breathy 
voice  tends  to  have  steeper  spectral  slope  than  modal  voice,  wliereas  creaky  voice  has  shallower  spectra!  slope  than  modal  voice  In 
addition,  spectral  slope  differs  between  speech  sounds  of  different  genders,  with  female  voices  typically  exhibiting  steeper  spectral 
slope  than  male  voiccs(KJatt  A  Klatt,  1990;  Monsen  &  Eingebretson,  1977;  Price,  1989).  The  three  stimulus  dimensions  therefore 
represent  two  categories  of  information  typically  earned  in  a  complex  sound  (e.g,  speech),  with  both  fundamental  frequency  and 
spectral  slope  defining  source  properties  and  the  structure  of  spectral  peaks  defining  a  resonator  characlenstic 

The  use  of  the  source -filter  model  in  synthesizing  the  stimuli  and  examining  the  relationship  between  the  source  and  filter 
properties  also  is  motivated  by  an  ecological  validity.  vSpecifically,  a  complex  sound  can  be  viewed  as  a  sequential  operation  of  the 
convolution  of  a  sound  source  and  a  filter  transfer  function;  and  listeners  are  believed  to  possess  a  highly  developed  ability  to  parse 
the  complex  round  into  the  source  and  filter  transfer  functions  on  the  basis  of  implicit  knowledge  acquired  in  life  (Gavct;  1993;  Li, 
logan,  &  Pastore,  1991 ;  McAdams,  1993;  Woods  A  Colburn,  1992).  The  most  familiar  examples  of  the  application  of  source-filter 
model  arc  in  studies  of  vowel  recognition  and  speaker  identification  A  vowel  sound  is  produced  by  passing  the  glottal  exciting  source 
through  the  vocal  tract  which  imposes  formant  structure  on  the  spectrum.  Listeners  are  not  only  able  to  use  formant  structure  to 
identify  the  vowel  category,  but  also  able  to  uncover  the  speaker  characteristics  carried  by  the  glottal  source  'Hie  current  study  will 
evaluate  the  influence  of  variation  in  the  filter  property  on  listeners’  ability  to  discriminate  the  source  properly,  and  thus  investigate 
the  perceptual  relationship  between  the  two  Gagne  and  Zurck(1988)  examined  effects  of  speech  source  upon  formant 
discrimination,  which  thus  represents  a  type  of  mirror  image  to  aspects  of  the  current  study.  Because  a  complex  sound  can  be  the 
joint  effect  of  any  sound  source  and  any  filter  transfer  function,  all  such  studies  in  the  extreme  arc  bound  to  fail  However,  by 
carefully  chrxising  the  source  and  filter  transfer  function,  such  studies  should  help  us  to  understand  the  perceptual  interaction  between 
the  properties  determined  bv  ihe  source  and  filter  transfer  function  Ilius  the  current  study  has  used  typical  speech  parameters  in 
defining  the  stimulus  projxrrlies 


I 


Spectral  Slope  Discrimination 


Page  3 


Although  little  psychophysical  research  has  directly  examined  perceptual  invariance  or  constancy  for  global  stimulus 
properties  of  complex  sounds*  many  speech  studies  have  revealed  a  constancy  in  listeners'  abilities  to  map  phonetic  category  from 
speech  sounds  that  vary  significantly  in  waveform  Because  nonlinguislic  variations  in  speech  sounds  are  hypothesized  to  be  "removed* 
or  "partitioned  out"  prior  to  phonetic  identification,  this  type  of  finding  has  been  called  “normalization"  I'hc  "normalization" 
processes  typically  are  described  as  factoring  out  fundamental  frequency  ( intrinsic  "normalization-),  the  characteristics  of  the  speaker 
( extrinsic  "normalization^,  or  both.  Global  invariant  properties  such  as  the  formant  structure  of  vowels  or  the  spectral  shape  of  stop 
consonants  ( see  below)  are  then  easily  extracted  for  identification  of  the  phonetic  category.  However;  listeners’  abilities  to  "normalize" 
speech  and  other  complex  sounds  have  been  explored  only  superficially  and  mechanisms  for  the  “normalization"  processes  arc  not 
understood  The  concept  of  "normalization"  used  in  this  study  thus  only  refers  to  recovery  of  relational  ( sometimes  also  global), 
rather  than  absolute,  properties  in  the  recognition  of  an  auditory  event  Evidence  for  the  use  of  global  properties  in  speech  perception 
was  reported  by  Stevens  and  Blumstein  (1981),  who  identified  three  types  of  global  spectral  shape  from  initial  consonantal  release  as 
important  cues  for  classification  of  stop  consonants.  Despite  variability  in  the  microstructure  of  consonantal  spectra,  Stevens  and 
Blumstein  (1978;  Blumstein  &  Stevens,  1979)  found  that  diffuse  nsing  and  diffuse  falling  spectral  shape  serve  to  cue,  respectively, 
bilabial  and  alveolar  consonant^  and  spectral  shape  compacted  in  the  mid-frequency  range  cues  velar  consonants 

Speech  sounds  are  not  the  only  auditory  event  that  involves  the  recovery  of  certain  global  properties  Music  melodies,  for 
example,  are  recognized  by  relative  frequency  relations  rather  than  absolute  frequencies  Changes  in  absolute  frequencies  have  very 
little  effect  on  listeners’  abilities  to  identify  a  music  melody  as  long  as  the  relative  frequencies  among  the  components  are  preserved 
Kidd  and  Watson  (1989)  used  a  sequence  of  five  tones  as  stimuli  to  examine  listeners’  abilities  to  detect  a  frequency  change  in  a  target 
component  when  the  tonal  sequence  was  transposed  in  the  absolute  frequency  but  the  frequency  relation  among  the  tonal  components 
remained  constant  It  was  found  that  surprisingly  small  amounts  of  frequency  transposition  (1  -2  semitones)  led  to  a  very  large 
increase  in  thresholds  However;  minimizing  pattern  uncertainty  ( by  presenting  the  same  pattern  on  every  trial)  resulted  in  dramatic 
improvements  in  performance,  and,  in  some  cases,  the  thresholds  were  almost  comparable  to  those  for  absolute  frequency  detection 
This  study  suggests  that  listeners  are  able  to  extract  a  relational  frequency  property,  at  least,  from  familiar  patterns  In  a  more  direct 
study  of  "normalization"  of  music  sounds,  Che*  Pastore,  and  Hall  (1991)  demonstrated  that  musically  experienced  listeners  can  factor 
out  differences  in  instrument  timbre,  while  perceiving  chords 

The  current  study  consisting  of  four  experiments  will  evaluate  the  discrimination  of  the  global  property  of  spectral  slope. 

The  discrimination  of  spectral  slope  requires  that  listeners  rely  on  the  intensity  relationship  across  the  spectrum  to  discriminate 
spectral  slope  Experiment  1  examines  a  potential  confounding  variable  that  may  be  used  by  listeners  when  asked  to  discriminate 
spectral  slope  The  remainder  of  the  study  has  two  major  parts  The  first  part  ( Experiment  2)  examines  the  effects  on  spectral  slope 
discrimination  when  the  frequency  composition  of  the  stimulus  is  changed  due  to  variation  in  fundamental  frequency.  This  design 
reflects  a  type  of  analog  to  processing  speech  sounds  in  which  the  speaking  characteristics  remain  constant  despite  variation  in  the 
fundamental  frequency.  The  second  part  of  this  study  (  Experiment  Z)  modifies  the  microstructure  of  the  spectra)  envelope  by 
imposing  spectral  peaks.  In  understanding  speech,  listeners  need  to  rely  on  the  gross  spectral  filter  transfer  function  across 
frequencies  regardless  of  amplitude  variation  in  local  frequencies  In  identifying  a  voice,  listeners  need  to  rely  on  gross  tendency  in 
source  properties  ( FQ  and  spectral  slope)  regardless  of  the  filter  properties  imposed  by  the  vocal  tract  Ihu^  the  current  design 
represents  a  type  of  investigation  of  the  invanance  of  speaking  characteristics  in  speech  sounds  produced  by  speakers  despite 
differences  in  vocal  tract  configurations  across  phonemic  categories  Experiment  4  explores  an  issue  not  resolved  from  Experiments  2 
and  3. 


General  Method 


Stimuli 

The  200  ms  stimuli  were  line  spectra  synthesized  by  digitally  summing 20  harmonic  sinusoidal  frequencies  in  a  sine  phase.  The 
digital  stimuli  were  shaped  by  5  ms  linear  onset  and  offset  ramps  Following  an  A/  D  conversion  (12 -bit  at  a  10  Khz  sample  rate),  the 
stimuli  were  low-pass  filtered  at  4  kHz  (via  a  senes  cf  ITHACO  Model  4302  filters,  yielding  48  dB/ octave)  prior  to  being  presented 
binaurally  over  TDH-49P  headphones  to  subjects  in  an  acoustic  chamber  For  each  stimulus,  the  intensity  of  a  frequency  component 
was  specified  mathematically  for  a  given  spectral  envelope.  For  Experiments  1  and  2,  (with  the  latter  varying  the  E^  as  the  irrelevant 
dimension),  each  stimulus  had  a  linear  flat  spectral  envelope  with  a  negative  spectral  slope,  as  shown  in  Figure  1  a  and  Hq  (1), 

1,(0  =  s'of*  ‘("I)  +  h,  (1) 

ith  frequency  component  ( i  =  1  to  20)  and  b  is  the  Eg  intensity.  For  a  flat  spectral  slope  stimuli,  b 
dB),  which  produced  no  clipping  of  the  waveform  Six  levels  of  spectral  slope  were  chosen  in  a 
frequency  component  in  a  step  of  -0.25  dB  An  example  of  these  flat  spectrum  stimuli  is  illustrated 


Insert  Figure  1  about  here 


Experiment  3  used  a  resonator  characteristic  as  the  irrelevant  dimension  I  Tic  spectral  envelope  for  each  of  these  stimuli  was 
determined  by  convoluting  a  Mat  spectral  slope  stimulus  with  a  sinusoidal  ( spectral)  resonator  transfer  tunc  turn  We  used  a  sine 


where  I^(  i)  is  the  intensity  of  the 
was  set  to  the  maximum  level  (72 
range  from  <0. 50  to  -1.75  dB  per 
in  Figure  I  a 


Spectral  Slope  I>iscn roi nation 


Page  4 


function  to  simulate  a  formant-like  characteristic  'l'he  resonator  spectral  envelope  is  specified  as 

lr(  0  =  m  •  sin  (2  ki/20),  (2) 

where  1  ( i)  is  the  weighting  function  applied  to  the  amplitude  ot  the  ith  frequency  component  This  function  specifies  a  resonator 
depth  (  m  =  4  dB)  and  the  number  of  resonator  frequencies  (_k  =  the  number  of  poles  or  spectral  peaks).  For  these  ripple  spectrum 
stimuli,  the  FQ  intensity  (b  in  Eq  1)  was  reduced  to68  dB  to  avoid  clipping  An  example  of  these  stimuli  is  shown  in  Figure  1  b 
litis  general  type  of  resonator  characteristic  also  was  used  by  Berstein  and  Green  (1987). 

Procedure 

Based  upon  a  pilot  study,  an  XAB  discrimination  task  seemed  to  be  more  effective  than  an  AX  task;  in  fact,  all  subjects  had  a 
great  deal  of  difficulty  in  performing  the  AX  version  of  the  task.  On  each  trial,  a  standard  stimulus  (  X)  was  always  presented  in  the 
first  interval,  followed  by  the  A  and  B  test  stimuli  One  of  the  test  stimuli  was  identical  in  spectral  slope  to  the  standard  stimulus  with 
the  other  stimulus  differing  in  spectral  slope  Subjects  indicated  which  of  the  two  test  stimuli  matched  the  standard  stimulus  in 
spectral  slope  by  preying  a  button  on  a  response  box  The  correct  response  to  both  panels  in  Figure 2  should  be  stimulus  B  (l'he 
explanation  for  this  figure  is  given  below.)  Trial-by-trial  feedback  was  provided  following  the  subjects’  responses.  The  three  stimuli  on 
each  trial  were  separated  by  500  ms  There  was  an  800  ms  inter-trial  interval 

l'he  current  study  employed  rwo  types  of  experimental  conditions  to  examine  the  effects  of  variations  in  an  irrelevant  dimension 
on  the  discrimination  of  a  relevant  dimension  ( spectral  slope).  On  a  single  trial  in  a  roving-irrelevant-dimension  condition,  two 
stimulus  dimensions  varied  simultaneously,  with  listeners  required  to  respond  to  the  difference  on  the  relevant  dimensior\  but  to 
ignore  the  variation  on  an  irrelevant  dimension  For  this  roving  condition,  the  two  test  stimuli  (A  and  B)  on  a  single  trial  had  the 
same  value  on  an  irrelevant  dimension  that  differed  from  that  of  the  standard  stimulus  (X).  On  a  trial  in  a  fixed-irrelevant-dimcnsion 
condition;  a  single  value  of  an  irrelevant  dimension  was  used  for  all  three  stimuli  Figure  2  shows  two  roving  conditions  which  differ  in 
the  type  of  irrelevant  dimensions:  fundamental  frequency  ( Fig  2  a)  and  number  of  spectral  peaks  (Ftg  2  b). 


Insert  Figure  2  about  here 


For  each  subject,  a  hit/  false-alarm  matnx  was  constructed  for  each  experimental  condition  as  a  function  of  differences  in  spectral 
slope.  Discrimination  indices  (d)  were  then  calculated  from  this  matrix  following  the  ABX  response  model  suggested  by  Macmillan 
and  Creelman  (1991 ;  also  Pierce  &  Gilbert,  1958). 

Subjects 

Siv  SUNY-Bmghamton  students  were  paid  for  their  participation  in  this  study.  The  subjects  originally  were  naive  to  auditory 
psychoacoustics  tasks  and  all  reported  normal  hearing 


Fxpenment  1 

Spectral  slope  was  defined  in  terms  of  a  systematic  decrease  in  intensity  of  a  frequency  component  with  an  increase  in  frequency 
Because  the  current  study  fixed  the  Fq  intensity,  stimuli  with  a  s*eeper  spectral  slope  had  lower  intensity  in  high  frequencies  relative  to 
those  with  a  shallower  spectral  slope  Thus,  spectral  slope  co-vaned  with  overall  stimulus  intensity  ( and  intensity  in  the  upper  portion 
of  the  spectrum.  By  definition,  the  manipulation  of  spectral  slope  must  co-vary  with  the  absolute  intensity  of  some  portions  of  the 
stimulus  spectrum).  As  a  result,  instead  of  judging  the  global  property  of  spectral  slope,  the  subjects  might  be  able  to  perform  the 
task  by  comparing  either  overall  stimulus  intensity  or  intensity  in  a  specific  high  frequency  region  Fjcperiment  1  explored  the  possible 
use  of  either  of  these  alternative  listening  strategies  in  the  spectral  slope  discrimination  task  Two  experimental  conditions  were  used. 
In  the  fixed-level  condition,  the  intensity  was  equal  for  the  three  stimuli  on  a  trial,  and  therefore,  overall  stimulus  level  was 
determined  solely  by  spectral  slope  (see  Fq.  1).  In  the  roving-level  condition,  FQ  intensity  was  determined  randomly  and 
independently  over  a  range  of  20  dB  for  each  of  the  three  stimuli  on  a  trial,  thus  precluding  an  effective  use  of  the  absolute  level  in 
performing  the  spectral  slope  discrimination  task  This  roving  of  intensity  procedure  has  been  used  in  profile  analysis  studies  to 
eliminate  responding  to  the  absolute  level  ( Green,  1988;  Mason,  Kidd,  Hanna,  &  Green,  1984). 

There  are  three  possible  patterns  of  outcomes  which  could  occur  with  the  two  conditions  First,  if  subjects  use  the  overall  stimulus 
level  as  a  cue,  but  cannot  respond  directly  to  spectral  slope,  discrimination  performance  should  be  high  in  the  fixed-level  condition, 
and  at  chance  in  the  roving-level  condition.  This  finding  would  suggest  that  the  current  project  should  be  discontinued  Second,  if 
subjects  use  information  only  about  spectral  slope,  and  thus,  do  not  use  the  correlated  intensity  cue,  equivalent  discrimination 
performance  should  be  observed  in  the  two  conditions)  and  subsequent  experiments  then  need  not  control  for  differences  in  overall 
stimulus  level  caused  by  varying  spectral  slope.  Finally,  if  listeners  use  both  types  of  information,  discrimination  performance  should 
be  higher  in  the  fixed-level  condition  than  in  ihe  roving-level  condition,  with  performance  for  both  conditions  above  chance.  l'he  last 
possible  outcome  will  require  roving  overall  stimulus  level  to  eliminate  this  correlated  intensity  cue 

St i mul*  and  Procedure 

Six  stimuli  were  svnthcsi/ed  by  using  the  six  levels  of  spectral  slope  with  a  170  11/  fundamental  frequency.  For  the  roving- level 


Spectral  Slope  Discrimination 


Page  5 


conditio^  the  overall  intensity  of  each  stimulus  within  a  trial  was  determined  randomly  in  a  range  of  20  dB  in  1  dB  steps  and  thus 
the  10  intensity  was  ranged  from  52  to72  dR  Roving  overall  stimulus  intensity  was  implemented  using  a  Charybdis  Model  D 

programmable  attenuator  Subjects  completed  12  blocks  of  trials.  The  60  tnals  in  a  block  were  created  by  crossing  each  of  the  six  ® 

levels  of  spectral  slope  with  every  other  level  and  with  equal  assignment  of  the  paired  spectral  slope  values  as  the  standard  stimulus 
The  data  collection  began  after720  practice  triais 

Results  and  Discussion 

Figures  shows  the  individual  discrimination  performance  for  the  two  conditions  as  the  function  of  slope  differences  The  solid 
curves  with  open  circles  show  the  results  of  the  fixed-level  condition,  and  the  dotted  curves  with  filled  circles  are  those  of  the  roving- 

level  conditions  These  curves  are  essentially  psychometric  function^  plotting  the  discrimination  ability  as  a  function  of  the  magnitude  • 

of  spectral  slope  difference  However;  they  differ  from  more  standard  psychometric  functions  in  that  each  point  along  the  abscissa 

represents  the  pooling  of  all  stimulus  pairs  that  differ  by  the  given  magnitude  of  spectral  slope  The  fixed-level  condition  for  all 

subject^  and  the  roving-level  condition  for  some  subjects,  reaches  ceiling  (d’  >  6.0)  at  the  largest  slope  differences  In  fact,  even  d‘ 

values  of  5.0  or  greater  cannot  be  very  accurate  since  all  differences  in  z  scores  reflect  extremely  small  changes  in  the  upper  5  %  of  the 

tail  of  the  normal  distribution  Moreover;  the  d’  values  for  the  larger  slope  differences  were  based  upon  fewer  stimulus  pairs  that 

differed  by  the  given  magnitude  than  those  for  the  smaller  slope  difference  (  Experiment  4  will  examine  the  importance  of  unequal 

sampling)  0 


Insert  Figure  3  about  here 


With  the  exception  of  Subject  2,  all  subjects  exhibited  linear  psychometric  functions  for  both  fixed-  and  roving-level  condition^  but 
with  a  shallower  slope  for  the  roving-level  condition  The  shallower  slope  for  the  roving-level  condition  than  for  the  fixed-level 
condition  indicates  that  the  subjects  were  all  using  the  absolute  stimulus  intensity  information  in  the  latter  condition,  but  could  still 
perform  the  spectral  slope  discrimination  task  when  the  absolute  intensity  provided  no  valid  information  Subject  2  was  not  able  to 
perform  the  roving  discrimination  task  to  a  reasonable  level  (eg,  d’>  1)  at  even  the  lareest  difference  in  spectral  slope,  indicating 
that  this  subject  was  probably  using  only  absolute  intensity  to  perform  the  discrimination1  The  average  psychometric  functions 
(  Figure  3  g  thus  are  based  upon  the  five  subjects  who  could  perform  the  discrimination  task  under  the  roving  condition  (it,  excluding 
Subject  2) 

Both  mean  psychometric  functions  arc  highly  linear  before  reaching  ceiling  with  essentially  zero  intercepts  (f  =  0.99  for  both 
conditions).  The  two  functions  differ  solely  in  the  slope  of  the  linear  regression  equations  (7.6  for  the  fixed-  and 4.9  for  the 
roving-level  condition).  The  ratio  of  these  values  is  1.55,  which  can  be  interpreted  as  the  absolute  intensity  cue  contributing 
approximately  35%  of  decision  information  to  the  fixed  discrimination  An  alternative  interpretation  of  this  ratio  ( adopting  the 
formula  from  Macmillai\  Braida,  &  Goldberg  1987)  is  that  this  roving  intensity  information  increases  the  variability  of  perceptual 
decision,  with  the  ratio  of  variances  being  1.40  ^  this  interpretation  is  moTt  appropriate  for  the  later  experiments  where  the  roving 
variable  clearly  functions  only  by  adding  noise,  rather  than  by  also  eliminating  a  correlated  cue  for  discrimination. 

A  2  x5  within-subject  analysis  of  variance  was  performed  on  the  d’  values  Ific  significant  main  effect  of  the  spectral  slope 
difference  confirmed  that  discrimination  improved  as  the  spectral  slope  difference  was  increased  (_F^ 4 ,20)  =  62.90,  p<  .05),  as 
exhibited  by  the  linear  psychometric  functions  The  significant  main  effect  of  the  conditions  indicated  betteT  discrimination 
performance  for  the  fixed-level  condition  than  for  the  roving-level  condition  (  Ff  1,5)  =  33.96,  p<  .05).  The  interaction  of  the  (  fixed- 
«*nu  rov iug-lc  cf;  tts.iJUions  and  the  *pc<..rai  slope  Ciffucncc  r.ct  significant  (X(-l, 20)  •=  2.07,  p>  .05). 

For  the  roving-level  condition,  discrimination  performance  was  clearlv  above  chance.  To  appreciate  this  finding  the  reader  is 
reminded  that,  on  each  trial  in  the  roving-level  condition,  overall  stimulus  level  was  varied  randomly  across  the  three  Stimuli;  therefore 
the  stimulus  with  the  steepest  spectral  slope  might  have  higher  overall  level  ihan  those  with  shallower  spectral  slopes  It  thus  was  not 
possible  for  the  subjects  to  make  accurate  judgment  on  the  basis  of  overall  stimulus  level  ( or  absolute  intensity  in  any  portion  of  the 
spectrum).  This  result  indicates  that  the  five  subjects  had  to  extract  some  sort  of  sound  quality  based  on  spectral  slope  from  these 
complex  stimuli  However  compared  with  that  tn  the  fixed-level  condition,  discrimination  performance  was  poorer  in  the  roving-level 
condition  This  latter  result  indicates  that  all  subject*  exploacd  overall  stimulus  l*vd  as  additional  nir  f  v  the  difference  in  spectral 
slope,  and  thus  requires  that  subsequent  experiments  eliminate  the  intensity  cue  in  evaluating  spectral  slope  discrimination  Therefore, 
in  the  subsequent  experiment^  overall  stimulus  lev  always  be  varied  randomly  over  a  range  of 20  dB  within  a  trial  (The  fixed 
versus  roving  conditions  in  Experiments  2 -4  will  difter  m  terms  of  an  irrelevant  stimulus  dimension,  with  both  conditions  in  each 
expenment  roving  overall  stimulus  level) 


Experiment  2 

Experiment  2  investigates  the  effects  of  variation  in  the  fundamental  frequency  on  spectral  slope  discrimination  Both 
fundamental  frequency  and  spectral  slope  are  considered  as  two  sound  source  properties,  it  is  quite  possible  that  listeners  will  treat  all 
sound  source  properties  as  a  unit  in  a  sound -producing  system  The  two  conditions  in  this  expenment  were  used  to  examine  the 
listener’s  ability  to  extract  spectral  slope  In  the  first  condition,  fundamental  frequency  was  fixed  within  a  trial  but  randomly  varied 
across  tnals  ( the  fixed  condition)  In  the  second  condition,  fundamental  frequency  was  randomly  varied  both  within  a  mal  and  across 
trials  ( the  roving  condition)  Ihc  performance  difference  between  the  two  conditions  will  indicate  the  magnitude  nl  the  advei.c  dials 


Spectral  Slope  Discrimination 


Page  6 


of  variation  in  the  fundament**!  frequency  upon  spectral  slope  discrimination,  and  therefore,  should  shed  light  upon  the  perceptual 
relationship  between  the  two  source  properties  The  research  method  is  similar  to  what  was  used  by  Durlach,  Ian,  Macmillan, 
Rabinowitz,  and  Rraida  (1989),  and  represents  an  accuracy-based  version  of  the  Garner  (1974)  strategy  for  evaluating  integral  versus 
separable  dimensions 

Stimuli  and  Procedure 

Tbe  six  levels  of  spectral  slope  from  Kxpenment  1  were  combined  with  three  levels  of  fundamental  frequencies  (150,  170,  and  190 
Hz)  to  create  18  stimuli  These  specific  fundamental  frequencies  were  chosen  to  guarantee  no  common  frequency  components  for  the 
stimuli  differing  in  the  fundamental  frequency.  In  the  fixed  condition,  the  three  stimuli  on  each  tnal  shared  a  common  fundamental 
frequency,  equally  selected  from  the  three  fundamental  frequencies  In  the  roving  condition,  two  different  fundamental  frequencies 
were  selected  for  the  standard  stimulus  and  the  two  test  stimuli  on  each  tnaL  I'hcrc  were  six  possible  combinations  of  fundamental 
frequency  pairs  by  crossing  the  three  fundamental  frequencies  with  every  other  For  each  condition,  the  subjects  completed  12  blocks 
of  60  trials  always  preceded  by720  practice  tnals  (  Because  of  roving  intensity  under  all  conditions,  the  fixed  condition  with  the 
170 -Hz  is  identical  to  the  roving-level  condition  in  lixpenment  1.) 

Results  and  Discussion 

Figure  4  shows  discrimination  performance  for  each  subject  under  the  two  conditions  in  which  the  fundamental  frequency  was 
fixed  (  solid  curves  with  open  circles)  or  randomly  varied  within  a  tnal  ( dotted  curves  with  closed  circles) .  With  the  exception  of 
Subject  2  under  the  roving  condition,  the  subjects  under  both  conditions  exhibited  a  generally  linear  growth  in  discrimination  as  a 
function  of  an  increase  in  the  spectral  slope  difference  Figure  4  g  describes  the  mean  d’s  across  the  five  subjects  ( excluding  Subject  2) 
For  d'  values  under 6  0,  the  two  mean  functions  are  linear (t*  =  0.99  and 0.98  for  fixed  and  roving  conditions,  respectively),  with 
intercepts  of  4)  13  and  -0.25.  These  results  were  confirmed  by  the  presence  of  a  significant  main  effect  of  the  spectral  slope 
difference  in  a  2  x5  wi thin-subject  analysis  of  variance  (j^4, 20)  =  62.51,  p<  .05).  The  results  clearly  indicate  »hat  the  subjects  were 
able  to  evaluate  spectral  slope  despite  variation  in  the  fundamental  frequency  Consider  the  roving  condition  in  which  the  stimuli 
differed  in  the  frequency  components  due  to  the  change  in  the  fundamental  frequer.'-/  Across  both  conditions  in  this  experiment,  the 
only  relational  property  that  remains  constant  regardless  of  variation  in  the  fundamental  frequency  ( and  in  overall  stimulus  intensity) 
is  spectral  slope  Therefore  the  subjects  had  to  rely  on  the  relational  ( or  global)  properties  defining  spectral  slope  to  make  correct 
responses 


Insert  Figure  4  about  here 


Poorer  discrimination  performance  was  found  in  the  roving  fundamental  frequency  condition  compared  with  the  fixed  fundamental 
frequency  condition  (1^1,5)  =  26.145,  p<  .05).  The  adverse  effect  due  to  the  variation  in  the  fundamental  frequency  was  evident 
across  all  the  six  subjects,  indicating  that  the  decision  on  spectral  slope  was  influenced  to  some  degree  by  the  difference  in  the 
fundamental  frequency.3  There  was  a  nearly  significant  interaction  between  the  levels  of  the  spectral  difference  and  the  two  conditions 
(2^4,20)  =  2.57,  p<  .06).  Ignoring  the  highest  levels  of  performance,  the  detrimental  effect  of  roving  fundamental  frequency  was  in 
terms  of  a  change  in  the  slope  of  the  psychometric  function  (4.4  versus  3.6  for  the  fixed  and  roving  conditions),  which  is  consistent 
with  added  noise  due  to  the  roving  condition  Substituting  the  slopes  of  the  regression  equations  for  d’  in  an  adapted  version  of  the 
formula  in  Macmillai%  Rraida  and  Goldberg  (1987),  it  is  estimated  that  the  ratio  of  variance  due  to  roving  fundamental  frequency 
relative  to  variance  associated  with  spectral  slope  discrimination  ( with  roving  overall  intensity)  isU.49.  Thus,  for  the  specific  range  of 
values  in  this  experiment,  roving  fundamental  frequency  adds  approximately  50%  more  variability  to  the  decision  process. 

Ilic  results  of  this  experiment  indicate  a  moderate  level  (50%  variance  increase)  of  Gamer  interference  which  is  said  to  occur 
when  slower  response  time  or  lower  accuracy  is  found  for  the  stimuli  with  incongruent  values  on  paired  dimensions  relative  to  stimuli 
with  congruent  values  ( Garnet;  1974).  Gamer  interference  indicates  a  failure  of  selective  attention  to  a  iclcvanl  dimension  due  to 
variation  in  an  irrelevant  dimension;  thus  the  paired  dimensions  are  considered  as  an  integral  perceptual  unit  Because,  in  typical 
studies  of  Gamer  interference,  researchers  use  highly  discnmmable  binary  values  on  paired  stimulus  dimensions,  results  from  typical 
studies  cannot  be  used  to  address  the  extent  to  which  the  perception  of  one  dimension  is  influenced  by  the  other  Rather  researchers 
typically  tend  to  draw  more  absolute  conclusions  about  whether  me  stimulus  dimension  1*  prrceptuaily  accessible  I'he  use  of  multiple 
values  on  both  relevant  and  irrelevant  dimensions  in  the  current  study  has  allowed  a  more  detailed  evaluation  of  degree  to  which  the 
relation  between  spectral  slope  and  the  fundamental  frequency  is  separable  from  the  fundamental  frequency. 

lixpenment  3 

In  contrast  to  the  use  of  two  sound  source  properties  ( spectral  slope  and  fundamental  frequency)  in  lixpenment  2,  lixpenment  3 
investigates  the  perceptual  constancy  of  spectral  slope  in  the  context  of  variation  in  the  number  of  spectral  peaks  (which  defines  a 
resonator  characteristic).  Because  spectral  slope  and  the  resonator  charactcnstic  reflect  different  sound-producing  components,  the 
detnmental  effect  of  varying  the  irrelevant  dimension  may  not  occur  in  lixpenment  3.  Agan\  two  conditions  were  used  to  determine 
the  effect  of  roving  the  resonator  charactcnstic  on  spectral  slope  discnmmation  In  the  fixed  condition  the  number  of  sj>cctral  peaks 
was  foed  within  a  trial,  but  varied  across  trials  In  the  roving  condition  the  number  of  spectral  peaks  was  varied  both  within  a  trial 
and  at  r-  >ss  trials 


I 


Spectral  Sk*pc  Discrimination 


Pag?/ 


stimuli  and  f  Procedure 

Hie  six  levels  of  spectral  slope  (from  {Experiment  1)  and  thtcc  levels  of  the  number  of  spectral  peaks  (2,  5,  and  8  peaks)  were 
combined  to  erra.c  8  stimuli  with  a  170  Hz  fundamental  frequency.  Spectral  peaks  were  equally  spaced  across  the  frequency  range  ^ 

l>ic  number  t<ctral  peaks  was  determined  in  a  pilot  study  in  which  the  subjects  discriminated  stimuli  generated  by  convolving  a 
Hat  spert.*'  uveIope(with  a  zero  spectral  slope)  with  the  filter  transfer  functions  differing  in  the  number  of  spectral  peaks  'Hie 
subjex  s  snowed  above 90%  accuracy  even  when  discriminating  stimuli  differing  by  only  one  spectral  peak.  Therefore,  stimuli  that 
d;  fie  red  by  three  spectral  peaks  on  a  trial  in  the  roving  condition  are  highly  disenminable  from  each  other;  and  should  cause  a 
significant  perceptual  variation  for  the  stimuli  The  question  is  whether  this  significant  perceptual  variation  has  any  effect  on  spectral 
slope  discrimination.  A  block  of  60  trials  were  created  using  the  paining  strategy  adopted  in  Experiment  2  ( including  the  roving  of 

overall  spectral  intensity).  Ihe  procedure  for  roving  the  number  of  spectral  peaks  was  identical  to  that  for  roving  fundamental  ® 

frequency  in  {Experiment  2.  f  or  each  condition,  the  subjects  completcd720  practice  trials,  then  12  blocks  of 60  trials. 

Results  and  Discussion 

figure  5  shows  the  individual  performance  and  the  mean  discrimination  across  the  five  subjects  (excluding  Subject  2)  for  each 
condition  as  a  function  of  the  magnitude  of  the  slope  difference  With  the  usual  exception  of  Subject  2  under  the  roving  condition, 
subjects  showed  spectral  slope  discrimination  improved  with  an  increasing  difference  in  slope,  with  nearly  identical  performance  under 

both  conditions  figure  5  g  is  the  mean  psychometric  function (  again  excluding  Subjects).  Consistent  with  the  findings  from  # 

Experiment  2,  the  observation  that  performance  improved  with  an  increase  in  the  slope  difference  was  confirmed  by  the  presence  of  a 

main  effect  of  spectral  slope  difference  (2^4,20)  =  58.458,  <  .000).  The  linear  growth  in  d’  ( for  d*  <  6.0)  with  the  increasing 

slope  difference  (r^  —  .99  for  both  conditions)  indicated  that  the  subjects  again  were  able  to  use  spectral  slope  as  a  cue  to 

differentiate  different  degrees  of  spectral  slope 


Insert  figure  5  about  here 


Although  overall  discrimination  performance  in  the  fixed  condition  was  slightly  better  than  that  m  the  roving  condition  1; 5 )  = 

13.897,  p<  .013  based  upon  data  from  all  six  subjects),  the  planned  comparison  test  showed  that  the  significant  statistical  difference 
occurred  only  for  the  stimulus  pair  with  the  largest  slope  difference,  and  was  mostly  due  to  the  failure  of  Subject  2  in  performing  this 
(  or  anv)  roving  task  Ihc  mean  psychometric  functions  ( excluding  Subject  2)  for  d’<  6.0  are  essentially  identical  in  slope  (4.9  vs 
5  0),  and  intercept  ( 0 .13  vs  -0.41)  for  the  fixed  and  roving  conditions  respectively,  f  ollowing  Macmillan,  Braida  and  Cioldberg 
( 1987 ;  see  Footnote  1),  virtually  no  additional  variance  was  added  into  the  subject's  judgment  bv  roving  the  number  of  spectral  peaks 
relative  to  the  fixed  condition  Ihe  lack  of  this  roving  effect  is  in  sharp  contrast  with  the  significant  difference  across  most  levels  of 
the  spectral  difference  in  I  jcpenment  2,  and  thus  suggests  that  spectral  slope  discrimination  is  independent  of  the  resonator 
characteristic  (  at  least  when  defined  in  terms  of  the  number  of  spectral  peaks) 

In  this  experiment,  listeners  had  to  extract  a  global  spectral  property  of  slope  tin  the  basis  of  overall  spectral  tendency,  while 
“tolerating"  variation  in  the  fine  structure  of  spectral  envelope  Although  this  result  was  obtained  from  nonspcech  complex  stimuli,  it 
is  consistent  with  those  from  speech  sounds  For  example,  Stevens  and  Blumsicin  (1981)  reported  that,  despite  variations  in  spectra 
of  initial  consonant  release,  listeners  used  overall  tendency  of  spectral  shape  as  a  cue  to  classify  stop  consonants  contrasted  in  place  of 
articulation  Hus  result  also  parallels  that  of  Gagne  and  Zurek(l988)  whq  measuring  thresholds  for  formant  discrimination  when 
varying  different  types  of  speech  source  (ic,  harmonic  senes  and  broadband  noise),  showed  that  listeners  were  able  to  disenminate 
formant  ( thus  resonant)  frequency  equally  well  for  the  harmonic  and  noise  glottal  source  Although  the  two  studies  focused  on  the 
opposite  stimulus  property  ( with  the  current  study  examining  source  discrimination  in  the  context  of  variation  in  the  resonator 
characteristic),  the  two  studies  seemed  to  agree  that  sound  source  and  resonator  characteristics  can  be  processed  independently.  In  a 
dilfcrenl  tvpe  of  study  of  sound  source  perception  Treed  (1990)  asked  listeners  to  judge  the  hardness  of  the  mallet  when  it  struck 
different  ivpcs  of  pans  with  ihe  mallet  and  the  pan,  respectively,  serving  as  sound  source  and  resonator  f  reed  found  that  listeners 
were  able  »n  judge  the  hardness  of  the  mallei  independent  of  types  of  pans  Ihts  finding  is  consistent  with  the  results  reported  here 

Experiment  4 

1  xpenment4  was  designed  to  address  two  unresolved  issues  from  the  preceding  experiments  first,  a  lack  of  performance 
difference  between  the  owing  and  the  fixed  conditions  wras  found  in  Fjrpenment  3  (varying  spectral  peaks)  in  contrast  to  a  significant 
difference  in  Experiment  2  (varying  fundamental  frequency  Because  {Experiment  2  preceded  Fjrpenment  3,  the  difference  in  roving 
lask  performance  might  be  explained  in  terms  of  differences  in  experience  with  the  task  Experiment  4  examined  thus  potential 
confound  by  partially  replicating  Experiment  2  using  a  sample  of  the  stimulus  pairs  Additionally,  because  the  current  study  computed 
d  s  as  a  function  of  levels  of  slope  difference  for  all  stimulus  pair*  with  a  given  magnitude  of  the  difference,  the  d'  values  for  the  larger 
slope  difference  were  based  upon  a  fewer  stimulus  pairs  than  the  d's  for  the  smaller  slope  difference  (it,  for  the  range  of  slopes 
between  4)  50  and  -1  75  dB  in  steps  of  -0.25,  there  are  5  compa«»M>ns  differing  by  -0  25  dtk  y  differing  differ  by  -O  50  dli  and  only  1 
differing  hv  1  25  dll)  Experiment  4  examined  whether  this  unequal  sampling  of  slope  different  c  may  have  influenced  the  stability  of 
'he  '{'  \ 


» 


Spectral  Slope  Discrimination 


Cage  8 


Stimuli  and  Procedure 

Ihe  18  stimuli  from  Experiment  were  used  to  generate  five  pairs  of  stimuli  with  the  slope  difference  listed  in  Tabic  1  Ihe 
selection  of  the  stimulus  pairs  was  guided  by  (1)  having  each  stimulus  pair  represent  one  level  of  the  slope  difference;  and  (2)  having 
each  level  of  the  slope  difference  occur  with  an  approximately  equal  probability.  These  five  pairs  of  stimuli  were  randomly  presented 
ten  times  in  a  block  of  50  trials  for  each  condition,  the  six  subjects  completed  six  blocks  of  trials  in  a  one-hour  session  The 
sequence  of  running  the  two  conditions  was  counterbalanced  across  the  subjects 


Insert  Table  1  about  here 


Results  and  Discussion 

'Hie  results  from  this  replication  of  Experiment  2  were  ^uite  similar  to  those  found  originally  in  Experiment  2.  A  2  x  5  withm- 
subject  analysis  of  variance  was  conducted  on  the  discrimination  performance  The  significant  main  effect  of  the  slope  difference  again 
indicated  that  discrimination  performance  differed  among  the  levels  of  slope  differences  (2(4,20)  =  39.287,  p<  .05).  Ihe  main 
effect  of  conditions  showed  that  discrimination  performance  was  lower  significantly  in  the  roving  condition  than  in  the  fixed  condition 
(2(1,5)  =  60.406,  jp<  .05).  Figure  6  shows  a  significant  difference  in  the  d’s  for  the  largest  slope  difference  (stimulus  pair 5  from 
Table  1)  between  the  fixed  and  the  roving  conditions  The  similarity  between  the  psychometric  functions  of  iuepenments  2  and  4 
indicates  that  the  results  of  these  preceding  experiments  had  not  been  significantly  biased  due  to  unequal  samples  of  data  used  to 
compute  the  d’s 

Insert  Figure  6  about  here 

lhe  new  results  for  the  fixed  discrimination  condition  yielded  a  linear  psychometric  function  (  i2  =  0.98),  with  a  slope  of  4.9, 
which  is  identical  to  the  psychometric  functions  for  the  equivalent  conditions  in  Experiments  1  and  3.  Ihe  slightly  depressed  level  of 
performance  found  in  the  original  fixed  condition  (  Experiments)  thus  would  seem  to  be  attributable  to  a  relative  lack  of  experience 
with  the  roving  task.  Ihe  roving  condition  also  exhibited  a  linear  psychometric  function  =  0.99),  but  with  a  somewhat  shallower 
slope  than  the  psychometric  function  obtained  in  lixperiment  2  (2.9  versus 3.6).  Be  ~  use  the  total  range  of  spectral  slopes  sampled  in 
Experiments  2  and  4  was  identical  (  0. 50  to  -1.75  dB  per  harmonic),  the  difference  in  slope  of  the  psychometric  functions  cannot  be 
attributable  to  a  simple  range-related  difference  in  context  variance  across  the  two  taste  However;  it  is  quite  possible  that  subjects 
were  bothered  more  by  the  increase  in  frequency  of  a  larger  change  in  the  irrelevant  dimension  ( e  g,  a  change  in  spectral  slope  of 
-1.25  occurred  with  a  provability  of  0.07  in  Experiment  2  and  a  probability  of  0.20  in  Experiment  4).  Ihe  persistence  of  this  large 
detrimental  effect  of  variation  in  the  fundamental  frequency  suggests  that  the  absence  of  any  decrement  in  Experiment  3  cannot  be 
attributed  to  the  practice.  Ihe  results  of  Experiment  4  thus  strengthen  ihe  speculation  that  spectral  slope  is  more  easily  separated 
from  »hc  resonator  characteristic  ihan  from  the  fundamental  frequency 

The  finding  that  perception  of  sound  source  is  relatively  independent  of  the  resonator  property  becomes  more  compelling  when 
examining  the  relevant  results  across  the  four  experiments  Figure  7  rc-plots  the  mean  discrimination  performance  for  the  fixed 
conditions  ( with  a  fixed  irrelevant  dimension,  but  roving  amplitude)  and  the  mean  discrimination  performance  for  the  roving 
conditions  ( with  a  roving  irrelevant  dimension)  from  Experiments 2  (  roving  FQ)  and 3  (  roving  filter  function).  Figure  7  shows  that  the 
magnitude  of  spectral  slope  discrimination  was  unaffected  by  varying  the  number  of  spectral  peaks  (  Experiment  3)  as  an  irrelevant 
dimension,  but  was  significantly  influenced  by  varying  the  fundamental  frequency  (  Experiment  2  and  also  Experiment  4). 


Insert  Figure  7  about  here 


One  account  for  this  asymmetric  result  can  be  formulated  in  terms  of  the  sound  production  model  Ihe  current  study  did  not 
arbitrarily  choose  two  acoustic  variables  as  the  irrelevant  dimensions  Rather  Experiment  2  used  a  definite  sound  source  property— 
the  fundamental  frequency-as  the  irrelevant  dimension,  whereas  Experiments  used  a  resonator  characteristic— the  number  of  spectral 
peaks— as  the  irrelevant  dimension  Speech  research  has  shown  that  many  speech  glottal  source  properties  arc  determined  by  both 
spectral  slope  and  fundamental  frequency  together  For  example,  various  types  of  vocal  registers  differ  in  the  fundamental  frequency 
and  spectral  slope  (  Childers  &  I>ce,  1991 ;  Hoi  lien,  1974).  Moreover  speech  produced  by  male  and  female  speakers  also  differ  in 
both  fundamental  frequency  and  spectral  slope  Ihe  various  speaking  characteristics  are  thus  determined  by  a  set  of  congruent  values 
of  glottal  acoustic  dimensions  It  is  unlikely  that  :ncongruent  values  of  glottal  acoustic  dimensions  will  invoke  an  unambiguous 
perception  of  the  speaker  identity.  Ihercfore,  listeners  may  treat  all  sound  source  properties  as  a  perceptual  unit  that,  as  a  whole, 
describes  the  characteristics  of  the  sound  production  system  Howcveq  analyzing  the  function  of  sound  source  and  filter  presents  a 
different  picture  For  speech  sounds,  different  linguistic  messages  are  generated  bv  maneuvering  the  vocal  apparatus  (e  g,  tongue, 
jaw,  lips  teeth,  etc),  which,  in  turn,  alters  the  resonance  or  filler  properties  Regardless  of  what  linguistic  messages  a  speaker 
produces  ( bv  changing  the  filter  properties),  the  speaking  characteristics  of  the  source  always  remain  constant  lhis  relative 
independence  of  the  glottal  vuinc  and  vocal  tract  configuration  may  dictate  the  absence  of  the  c fleet  of  variation  in  the  number  of 


Chord  Dist  rimiiwtioii 


Spectral  Slope  Discrimination 


Page  9 


s['cctral  peaks  on  spectral  slope  discrimination 


General  Discussion 

The  current  study  was  an  attempt  to  utilize  a  psychophysical  procedure  to  investigate  the  perception  of  romple*  global  stimulus 
attributes  which  would  be  neglected  in  the  past  psvchoacoustics  research.  Evidence  has  suggested  that  the  research  that  focuses  on 
detailed  acoustic  properties  may  not  be  able  to  successfully  predict  performance  in  perception  of  speech  and  other  complex  naturally- 
occumng  sounds 

Because  the  past  psychoacoustics  research  focuses  on  the  abilities  of  auditory  peripheral  s,  cm  to  detect  or  disenminate  changes 
in  elementary  acoustic  properties  such  as,  pitch  loudness,  phase  and  duration  of  a  signal  it  is  called  the  Fourier  analysis  approach 
( Gaveg  1993).  Understanding  listeneis'  abilities  to  process  elementary  acoustic  properties  has  taught  researchers  a  great  deal  about 
elemental  aspects  of  perception  and  about  the  physiological  underpinnings  of  auditory  information  processing  Howevei;  our  auditory 
system  also  needs  lo  recover  more  global  stimulus  properties  such  as  source  characteristics  of  a  sound-producing  system,  through 
listening  to  the  sound  produced  by  the  system  (eg,  Freeh  1990;  Gaveg  1993;  U  et  al,  1992;  Repp  1987;  Warren  &  Verbrugge, 
1984).  It  is  doubtful  that  studying  the  processing  of  elementary  acoustic  properties  is  sufficient  to  promote  a  full  understanding  of 
perception  of  naturally-occurring  sounds  Although  we  do  not  agree  with  the  extreme  position  that  studying  the  processing  of 
elementary  properties  in  the  hope  to  understand  perception  of  naturally-occurring  sounds  is  as  distant  as  studying  the  processing  of 
features  of  letters  in  the  hope  to  understand  reading  comprehension  ( Gavrt;  1993),  there  clearly  is  mem  to  moving  to  a  higher  level 
of  analysis  when  trying  to  understand  more  global  aspects  of  perception 

The  souice-fil'er  analysis  approach,  used  in  the  current  study,  provides  one  alternative  to  the  Fourier  analysis  approach  Rather 
than  using  elementary  acoustic  properties  as  stimulus  dimension^  the  source  filler  approach  assumes  ihat  a  complex  sounds 
particularly,  a  naturally  occurring  sound,  is  produced  by  an  interaction  of  sound  producing  components  such  as  poweg  oscillator, 
resonator  and  coupler  A  sound  is  produced  by  vibrating  the  oscillator  which  is  excited  by  the  power  The  sound  is  then  shaped  by 
the  resonator  where  the  spectrum  of  the  source  is  tuned  to  the  resonator  characteristics  Thug  the  acoustic  ( spectral  and  temporal) 
effects  of  these  sound-producing  components  should  be  stimulus  dimensions  in  a  laboratory  study  of  perception  of  naturally  occurring 
sounds  Although  the  source  filter  model  describes  sound  production,  the  model  also  provides  a  guide  for  investigating  perception  of 
characteristics  of  sound-producing  components  To  implement  this  research  scheme,  two  types  of  perceptual  questions  must  be  asked: 
(1)  whether  the  acoustic  effect  of  each  component  is  perceptually  accessible;  and  (2)  whal  is  ihe  nature  of  perceptual  interaction 
among  the  components  To  investigate  the  Fust  question  the  physics  of  each  component  must  be  understood,  and  the  acoustic  effect 
of  each  component  should  be  then  examined  by  varying  different  physical  parameters  defining  the  component.  Because  not  every 
acoustic  effect  is  perceptually  meaningful  a  perceptual  investigation  needs  to  follow  the  acoustic  analysis  Psychophysics  analysis  of  the 
acoustic  effect  can  help  delcrmine  the  mapping  from  the  physical  parameter  of  the  sound-producing  component  to  the  acoustic  effect, 
and  then  to  the  perceptual  effect 

llie  first  question  focuses  on  individual  components  of  a  sound-producing  system,  including  The  current  study  altempted  lo 
answer  the  second  question  of  how  the  sound-producing  components  may  be  interacted  to  influence  our  perception  of  source 
characteristics  We  varied  two  types  of  source  properties  and  a  filler  property  with  the  intent  to  understand  the  perceptual  interaction 
between  the  source  and  filter  properties  Relative  to  the  impact  exerted  by  the  fundamental  frequency  (anolher  source  property),  the 
filter  property  had  less  influence  upon  the  perceptual  resolution  (discrimination)  of  speciral  slope  ( a  source  property)  The  results 
supported  the  claim  that  the  source  and  filter  property  were  relatively  independent  ( at  least  for  the  types  of  source  and  filler 
properties  used  in  this  study)  Howeveg  it  certainly  remains  unclear  whether  this  finding  can  be  generalized  lo  the  combination  of  any 
source  and  filter  properties  Although  it  ts  likely  Ihat  the  validity  of  this  finding  depends  on  the  type  of  source  and  fitter  properties 
(it,  transfer  function)  used  in  this  study,  the  general  research  approach  is  certainly  useful  for  studying  the  perception  of  complex 
sounds  especially,  naiurally-occumng  sounds  For  example  in  studying  ihe  gender  judgment  by  listening  to  ihe  sounds  of  human 
footsteps  ( Ij,  et  al,  1992)  this  approach  can  be  used  lo  determine  wtiat  proportions  of  shoe  (source)  and  walking  surface  ( resonatot) 
factors  contribute  to  the  gerder  judgment  of  a  walkec  This  research  is  now  underway  in  the  Pastore's  laboratory  at  SUNY- 
Ringhamton 

Although  this  discussion  emphasizes  the  importance  and  necessity  of  the  source-filter  analysis  approach,  both  the  Fourier  analysis 
and  the  source-filter  analysis  approaches  represen!  important,  but  different  levels  of  analysis  in  invesligalton  of  human  audition  In 
vtsiory  Mart  ( 1982)  proposed  three  different  levels  of  data  structures:  1)  the  primal  sketch,  2)  the2-D  sketch,  and3)  Ihc3-D  model 
An  audition  analog  of  Marr's  Iheory  was  developed  by  Richards(1988).  The  primal  sketch  is  Ihe  waveform  or  spectrum 
representation  of  the  signal  Die  Fourier  analysis  approach  certainly  helps  us  to  understand  the  resolution  of  various  acoustic 
properties  in  such  a  representation  While  the2-D  sketch  consists  of  “visible"  surface  properties  in  vision  in  audition  Ihc2-D  sketch 
may  include  sound  localization  and  properties  of  source  and  filter ( e  g,  harmonietty  source,  noise  source;  cylindrical  resonator  and 
drum;  metal  wood  and  glass  material)  The  source-filter  analysis  approach  can  be  used  to  investigate  the  separation  of  source  and 
filter  properties  as  well  as  the  perceptual  relationship  among  them  Finally,  the  3-D  model  is  a  vivid  and  coherent  description  of  the 
sound  in  a  3 -dimensional  space  This  gross  theoretical  scheme  may  provide  a  direction  for  us  lo  eventually  undersland  ihe  perception 
of  complex  naturally-occurring  sounds,  and  lo  organize  what  we  have  known  about  the  perception  of  complex  sounds  The  specificity 
and  the  validity  of  this  research  scheme  can  be  tested  only  by  more  empirical  research  and  theoretical  advancement 


* 


Spectral  Slope  Discrimination 


Page  10 


Bernstein,  L  R,  &  Green,  D.  M  (1987).  Detection  of  simple  and  complex  changes  of  spectral  shape.  Journal  of  the  Acoustical 
Society  of  America  82.  1587  -1592 

Blumstein,  S  E,  &  Stevens,  K.  N.  (1979).  Perceptual  invanance  in  speech  production:  Evidence  from  measurements  of  the 
spectra!  characteristics  of  stop  consonants  Journal  of  the  Acoustical  Society  of  America,  66,  1001  -1017. 

Childers,  D  G,  &  Lee,  C  K.  (1991).  Vov  al  quality  factors:  Analysis  synthesis  and  perception  Journal  of  the  Acoustical 
Society  of  America,  90,  2394-2410. 

Chq  1  L,  Hall,  M  D.,  &  Pastore,  R  E  (1991).  Normalization  process  in  thehuman  auditory  system  Journal  of  the  Acoustical 
Society  of  America,  89,  pt  2,  198*. 

Chnstophcrson,  L  A  (1992).  Some  psychometric  properties  of  the  Test  of  Basic  Auditory  Capabilities  (1”BAQ.  Journal  of 
Speech  and  1  fearing  Research  35,  929-035. 

Cohen,  M.  F,  &  Schubert,  E  U  (1987).  The  effect  of  cross-spectrum  correlation  on  the  detectability  of  a  noise  band  Journal 
of  the  Acoustical  Society  of  America  81,  721-723. 

Durlach,  N.  I,  Tan,  H  Z,  Macmillar\  N.  A,  Rabinowitz,  W.  M,  &  Braida,  L  D.  (1989).  Resolution  in  one  dimension  with 
random  variations  in  background  dimensions  Perception  &  Psychophysics,  46,  293-296. 

Fant,  Ci  (1960).  Acoustic  Theory  of  Speech  Production  Mouton:  The  Hague 

Freed,  D.  X  (1990).  Auditory  correlates  of  perceived  mallet  hardness  for  a  set  of  recorded  percussive  sound  events  Journal  of 
the  Acoustical  Society  of  America.  87.  311-322. 

Gagne,  X,  &  Zurek,  P.  M  (1988).  Resonance-frequency  discrimination.  Journal  of  the  Acoustical  Society  of  America.  83, 

2293-2299. 

Gamer,  W.  R  (19”4).  The  Processing  of  Information  and  Structure  Potomac,  MD:  Erlbaum 

Gavei;  W.  (1993).  What  in  the  world  do  we  hear?  An  ecological  approach  to  auditory  source  perception  Ecological  Psychology, 

5,  1  -29. 

Green,  D.  M.  (1988).  Profile  Analysis:  Auditory  Intensity  Discrimination  NY:  Oxford 

Hall  1  W.,  Haggard,  M  P,  &  Fernandes  M  A  (1984).  Detection  in  noise  by  spectrotcmporal  pattern  analysis  Journal  of  the 
Acoustical  S<xiety  of  America,  76,  50-66. 

Hollien,  U  (1974).  On  vocal  register  Journal  of  Phonetics  2,  125-143. 

Kidd  CI  R,  &  Watsor\  C  S  (1989).  Detection  of  relative- frequency  changes  in  tonal  patterns  Journal  of  the  Acoustical 
Society  of  America,  86,  S121. 

KJatt,  D.  II,  &  Klatt,  L  Cl  (1990).  Analysis,  synthesis  and  perception  of  voice  quality  variations  among  female  and  male 
talkers  Journal  of  the  Acoustical  Society  of  America  87.  820-857. 

Li,  X,  Ixigan,  R  X,  &  P?store,  R  E  (1991).  Perception  of  acoustic  source  characteristics:  Walking  sounds  Journal  of  the 
Acoustical  Society  of  America,  90,  3036-3049. 

Macmillan  N  A,  Braida,  L  D.,  &  Goldberg  R  F.  (1987).  Central  and  peripheral  processes  in  the  perception  of  speech  and 
nonspeech  sounds  In  M.  B  H  Schouten(ed.),  The  Psychophysics  of  Speech  Perception  The  Hague:  Nijhoft 

Macmillan,  N  A,  &  Creelman,  C  D  (1990).  Detection  Theory.  A  User's  Guide  Cambridge:  Cambridge 


Marr,  I).  (1982).  Vision  San  Francisco:  Freeman 

Mason,  (.  R,  Kidd,  G,  Jr,  Hanna,  T.  E,  &  Green,  D.  M  (1984)  Profile  analysis  and  level  variation  Hearing  Research  1J, 

269  275 


Spectral  Slope  Discrimination 


Pagell 


McAdams,  S  (1993).  Recognition  of  sound  sources  and  events  In  S  McAdams  and  E  Btgand  (eds),  'Dunking  in  Sound:  'ilic 
Cognitive  Psychology  of  Human  Audition  NY:  Oxford. 

McFadden,  D.  (1987).  Comodulation  detection  differences  using  noise-band  signals  Journal  of  the  Acoustical  Society  of 
America  81,  1519-1527. 

Monsen,  R  B,  &  Engebretson,  A  M.  (1977).  Study  of  variations  in  the  male  and  female  glottal  wave  Journal  of  the  Acoustical 
Society  of  America  62,  981  -093. 

Pastore,  R  E  (1981).  Possible  psychoacoustic  factors  in  speech  perception.  In  P.  D.  Etmas  and  JL  Miller  (Eds),  Perspectives  on 
the  Study  of  Speech  Hillsdale,  NJ:  Lawrence  Erlbaum 

Pierce,  I  R,  &  Gilbert,  E  N.  (1958).  On  AX  and  ABX  limens  Journal  of  the  Acoustical  Society  of  America  30,  593-695. 

Price,  D.  G  (1989).  Male  and  female  voice  source  characteristics:  Inverse  filtering  results.  Speech  Communication,  8,  261-277. 

Repp  R  11  (1987).  The  sound  of  two  hands  clapptng:  An  exploratory  study  Journal  of  the  Acoustical  Society  of  America  81, 
1100-1110. 

Richards,  W.  (1988).  Sound  interpretation.  In  W.  Richards,  (ed),  Natural  Computation  Cambridge,  MA:  MIT. 

Stevens,  K.  N  (1989).  On  the  quantal  nature  of  speech  Journal  of  Phonetics  17,  3-45. 

Steven^  K.  N,  &  Blumsteip  S  E  (1978).  Invariant  cues  for  place  of  articulation  in  stop  consonants  Journal  of  the  Acoustical 
Society  of  America  64,  1358-1365. 

Steven^  K.  N,  &  Blumstein,  S  E  (1981).  The  search  for  invariant  acoustic  correlates  of  phonetic  features  In  P.  D.  Eimas  and 
I  Miller  (Eds),  Perspectives  on  the  Study  of  Speech  Hillsdale,  NJ:  Lawrence  Erlbaum. 

Warren,  H,  &  Verbrugge,  R  R  (1984).  Auditory  perception  of  breaking  and  bouncing  events:  A  case  study  in  ecological 
acoustics  Journal  of  Experimental  Psychology:  Human  Perception  and  Performance,  10.  704-712. 

Watson,  C  S  (1991).  Auditory  perceptual  learning  and  the  cochlear  implant  The  American  Journal  of  Otology.  Supplement 

Watson  C  S,  Qiu,  W.  W.f  Chamberlain,  M.  M,  &  Li  X  (1993).  Auditory  and  visual  speech  perception:  Confirmation  of  a 
modality-independent  source  of  individual  differences  Journal  of  the  Acoustical  Society  of  America  ( under  review) 

Wright,  B  A  (1990).  Comodulation  detection  differences  with  multiple  signal  bands  Journal  of  the  Acoustical  Society  of 
America  87,  292-303 

Wood$  W.  S,  &  Colburn  H.  &  (1992).  Test  of  a  model  of  auditory  object  formation  using  intensity  and  interaural  time 
difference  discrimination  Journal  of  the  Acoustical  Society  of  America  91,  2894-2902. 

Yost,  W.  A,  &  Sheft,  S  (1989).  Across-cntical-band  processing  of  amplitude-modulated  tones  Journal  of  the  Acoustical  Society 
of  America  85,  848  857. 

Yost,  W.  A,  Sheft,  S,  &  Opie,  1  (1989).  Modulation  interference  in  detection  and  discrimination  of  amplitude  modulation 
Journal  of  the  Acoustical  Society  of  America  86,  2138-2147. 


Author  Note 

This  research  was  supported  inpart  by  the  NSF  and  AFOSR  grants  awarded  to  Richard  E  Pastorc,  and  facilitated  by  the 
assistance  from  the  Center  for  Cognitive  and  Psycholinguist ic  Sciences  at  the  State  University  of  New  York  at  Bingnamton.  The 
preparation  of  this  manuscript  by  the  first  author  was  funded  by  the  NIH  and  AFOSR  grants  awarded  to  Charles  S  Watson  at  Indiana 
University.  Any  opinion^  findings,  and  conclusions  expressed  in  this  publication  are  those  of  the  authors  and  do  not  reflect  the  views 
of  the  funding  agencies  The  first  author  is  currently  affiliated  with  the  Center  for  Speech  Processing  the  Department  of  Electrical 
and  (Computer  Engineering  Ilie  Johns  Hopkins  University.  Requests  for  reprints  should  be  sent  to  Xiaofeng  1.1,  ITic  C  enter  for 
Speech  Processing  the  Department  of  Electrical  and  Computer  Engineering  The  Johns  Hopkins  University,  Baltimore,  Ml) 2 12 18. 


Spectral  Slope  l>is<  nmination 


Page  12 


Footnotes 

1  In  all  of  the  experiments  only  Subjects  failed  to  exhibit  reasonable  levels  of  performance  under  the  roving  conditions  1116  fixed 
condition  of  Experiment  2  is  essentially  equivalent  to  the  roving-level  condition  in  Experiment  1,  yet  the  performance  of  Subject  2  was 
equivalent  to  th  .t  of  the  other  subjects  Therefore,  it  would  appear  that  Subject  2  failed  to  understand  the  requirements  of  the  roving 
conditions 

2  Macmillan,  Braida  and  Goldberg (1987)  used  the  following  formulae  to  evaluate  the  relative  variance  contributed  by  sensory  and 
context  codings, 

Context  Variance/  Sensory  Variance  =  (i?'f  ixed^~'roving^  '  (3) 

We  substituted  the  ratio  of  d’  slopes  from  the  fixed  and  roving  conditions  for  the  d’  ratio  in  this  formulae  to  evaluate  the  additional 
variance  introduced  by  roving  an  irrelevant  dimension 

3  In  this  research,  we  have  defined  spectral  slope  in  terms  of  change  in  dB  per  harmonic  of  the  fundamental  frequency.  This  definition 
makes  a  great  deal  of  sense  in  terms  of  a  production  system  which  differs  in  terms  of  the  fundamental  frequency  of  a  glottal  source 
This  definition  of  spectral  slope  also  means  that  the  spectral  envelope  is  maintained  across  FQ  when  defined  as  a  function  of  octave,  or 
any  other  behaviorally-  or  physiologically-relevant  unit  ( usually  a  logarithmic  function  of  frequency).  Thus*  a  change  in  the  frequency 
region  of  the  signal,  accomplished  by  shifting  FQI  should  result  in  a  maintaining  equivalent  changes  in  intensity  per  octave  or  critical 
band  as  a  function  of  increasing  frequency.  Although  changes  in  both  FQ  and  ripple  ( filter)  frequency  will  alter  the  absolute  amplitude 
of  energy  in  any  given  frequency  region  ( defined  in  terms  of  either  fixed  or  behaviorally-relevant  units),  the  roving  of  amplitude  under 
all  conditions  makes  absolute  amplitude  cues  irrelevant  to  the  discrimination  task.  However;  if  one  believes  that  spectral  slope  is 
better  defined  in  absolute  physical  units  (change  in  dB  per  fixed  unit  of  frequency,  for  example,  average  change  in  dB  per  Hi),  then 
FQ  and  spectra)  slope  arc  related  variables,  whereas  npple  frequency  (defined  across  the  full  spectrum  of  the  signal)  and  spectral  slope 
are  independent  variables  Because  we  are  evaluating  perception,  and  the  applicability  of  the  source-filter  model  to  perception,  the 
definition  of  terms  in  behaviorally-relevant  terms  is  most  logical  strategy. 

Although  future  research  might  try  to  eliminate  the  confound  ( when  defined  in  physical  terms)  between  F^  and  spectral  slope,  the 
task  is  not  simple  For  example,  defining  spectral  slope  in  terms  of  a  constant  change  in  amplitude  per  unit  Hz,  by  its  very  nature, 
forces  a  spectral  slope  to  be  correlated  with  frequency  when  interpreted  in  terms  of  any  of  the  more  behaviorally-relevant  definitions 
of  slope  (eg,  dB  per  critical  band).  Therefore,  running  the  same  task  with  this  alternative  definition  of  spectral  slope  should  result  m 
lower  performance  in  the  fixed  condition  (due  to  behaviorally  nonlinear  nature  of  the  resulting  spectral  slope)  and  an  even  greater 
drop  in  performance  under  the  roving  Ffl  condition  (due  to  the  correlated  change  in  behaviorally-relevant  definition  of  slope).  We 
thus  believe  that  we  have  evaluated  the  relevance  of  two  potentially  independent  perceptual  variables  on  the  perception  of  a  single 
perceptual  variable 


Tabic  1.  The  Values  of  Spectral  Slope  for  Stimulus  Pairs  Presented  on  a  Single  Trial  (or  Experiment  4.  Slope  1  and  Slope  2  were 
randomly  assigned  to  the  standard  and  test  stimuli  on  a  trial 


Stimulus  Pairs 

Slope  1 

Slope  2 

Slope 

1 

-1.00 

-1  25 

0.25 

2 

0.75 

-1  25 

0  50 

3 

0.75 

-I  50 

0  75 

4 

0.50 

-1.50 

1.00 

0.50 


-1.75 


1.25 


Spectral  Slope  l>iscnminatKHi 


Page  13 


Figure  Captions  ^ 

Figure  1.  Schematic  representation  of  the  stimuli  The  panel  consists  of  a  20  harmonically-related  complex  sound  (shown  by  the 
vertical  lines)  with  a  decreasing  spectral  envelope  The  abscissa  is  the  harmonic  number;  and  the  ordinate  is  the  spectral  intensity. 

The  bottom  panel  illustrates  a  stimulus  produced  by  passing  the  stimulus  in  the  top  panel  through  a  five  spectral  peak  filter 

Figure  2.  Illustration  of  the  tnal  structure  for  Experiments  2,  3  and  4.  The  top  panel  is  the  trial  structure  for  Experiment  2,  whereas 
the  bottom  panel  is  for  Experiment  3.  For  each  stimulus  shown  by  a  schematic  spectrun\  the  vertical  lines  are  the  20  harmonic  tones 

with  the  spectral  intensity  specified  in  the  ordinate  For  the  detailed  description  of  the  trial  structure,  see  the  text  • 

Figure  3.  Results  for  Experiment  1.  The  first  six  graphs  are  the  results  for  the  individual  subjects  The  last  graph  is  the  mean  across 
the  five  subjects  ( excluding  Subject  2).  For  each  graph,  the  abscissa  is  the  difference  in  spectral  slope  within  a  trial,  and  the  ordinate 
is  the  d’  value.  The  solid  curves  with  open  circles  are  the  results  for  the  fixed-level  condition,  and  the  dotted  curves  with  closed  circles 
are  those  for  the  roving-level  condition. 

Figure  4.  Results  for  Experiment  2.  The  solid  curves  with  open  circles  are  the  results  for  the  fixed  irrelevantdimension  ( fundamental  0 

frequency)  condition,  and  the  dotted  curves  with  closed  circles  are  those  for  the  roving  irrelevant  dimension  condition 

Figure 5.  Results  for  Experiments.  The  solid  curves  with  open  circles  are  the  results  for  the  fixed  irrelevantdimension  ( the  number 
of  spectral  peaks)  condition,  and  the  dotted  curves  with  closed  circles  are  those  for  the  roving  irrelevant  dimension  condition 

Figure  6.  Results  for  the  replication  of  Experiment  2.  llie  solid  curves  with  open  circles  are  the  results  for  the  fixeo 

irrelevantdimension  (  fundamental  frequency)  condition  and  the  dotted  curves  with  closed  circles  are  those  for  the  roving  irrelevant  0 

dimension  condition 

Figure 7.  Comparison  of  decrements  caused  by  variation  in  the  irrelevant  dimensions  I'he  left  panel  varied  the  fundamental 
frequency  as  the  irrelevant  dimension  (  Experiment  2),  whereas  the  right  panel  vaned  the  number  of  spectral  peaks  as  the  irrelevant 
dimension  ( Experiment  3). 


Irrelevant  dimension  =  fundamental  frequency 


O 

'5.  N- 


m  o  m  o  tn  o 


»  II 

CO  © 
0)  Q. 

Q.  O 


0  50 


6  50 


Witlmi-calcgory  discrimination  of  musical  chords  Perceptual  magnet  w  anciiot 


Barhaia  K  Acker.  Richard  1  Pasture,  and  Michael  l)  Hall 


Abstract 

Kecetil  speech  research  (e  g  I  juickner-Morano  k  Sussman,  1993,  Lively,  1993.  kuhl.  1991 )  has  demonstrated  that  the 
presence  of  prototypes  may  be  reflected  in  the  internal  structure  of  speech  categories  'Hie  current  study  examines  the  function  of 
prototypes  in  another  natural,  but  nompeech  category.  Prototype  (P)  and  nonprototype  (NP)  sets  ol  major  triad  stimuli  were  constructed, 
with  stimuli  ui  the  P  set  being  more  representative  of  the  category  than  the  NP  stimuli  Musically  experienced  subjects  rated  the  stimuli  in 
each  set  for  goodness  as  a  major  tnad,  with  Uie  highest  rated  stimulus  serving  as  a  prototype  standard  lor  a  subsequent  discrimination  task. 
Results  from  the  discrimination  task  demonstrated  better  performance  in  the  prototype  context.  Some  contrasting  speech  results  (kuhl, 
1991 )  instead  found  lower  discrimination  in  a  vowel  P  context  compared  to  a  NP  context,  suggesting  that  a  prototype  may  function  as  a 
perceptual  magnet,  effectively  decreasing  perceptual  distance,  and  thus,  discnminabihty,  between  stimuli  Ibc  current  nonspccch  results 
appear  to  follow  predictions  based  on  classification  and  perceptual  models,  and  provide  a  natural,  nonspeech  contrast  to  speech  findings. 

hnd  of  Abstract 


Running  Head  Chord  Discrimination 


Although  it  has  long  been  conjectured  that  speech  categorization  may  be  based  upon  the  use  ol  prototypes  or  exemplars,  most 
speech  perception  research  has  focused  on  tfie  location  of  category  boundaries,  with  little  attention  given  to  perceptual  changes  wlucli,  in 
theory,  should  exist  within  categories  that  are  based  upon  the  use  ol  prototypes  l his  locus  on  category  boundaries  probably  is  a  carry-over 

from  the  notions  of  categorical  perception  which  posited  absolute  recoding  of  perception  in  terms  ol  discrete  plionelic  categories,  with  any  0 

within  category  variation  dismissed  as  stimulus  artifacts  (Sludden -Kennedy ,  l  aberman,  Hams,  k  Cooper,  1970)  In  contrast  to  this  long 

tradition  of  categorization  studies  based  on  labeling  tasks,  some  recent  research  has  begun  to  examine  the  internal  structure  ot  speech 

categories (Ruhl.  1991.  I auckncr-Morano  k  Sussman.  1993, la  k  Fastore,  1992,  lavelv.1993.  Samuel.  19X2.  Volatis  k  Miller.  1992) 

in  general,  category  membership  is  found  to  bo  qualitatively  graded,  with  kuhl  ( 1991 )  providing  evidence  lltai  quality  ol  membership  is 

reflected  in  the  discnminahility  between  stimuli  Qualitative  grading  and  patterns  of  discrimination  within  categories  arc  probably 

indicative  of  general  category  structure  and  should  also  occur  in  nonspeech  categories  Ihc  current  research  evaluates  quality  ot  category  0 

membership  and  discnminabihlv  tor  a  musical  category,  another  natural,  hut  nonspeech  calegorv 
Frototype  and  Kxemplar  Notions 

Prototype  and  exemplar  models  arc  currently  the  two  more  predominant  approaches  to  modeling  perceptual  categorization, 
each  addressing  die  nature  ol  category  members  somewhat  ditleicnllv  Prototype  theory  (c  g  IVimict  k  kecle.  1 9<>X )  attributes 
categorization  to  the  comparison  of  incoming  stimuli  to  internal  prototypes  winch  arc  some  form  ol  averaged  oi  ideal  category 

representations  Although  die  exact  nature  ol  a  prototype  may  differ  across  models,  dicre  is  some  agreement  that  expo  icnccs  w  idi  a  0 

particular  catcg«>ry  contribute  to  defining  the  category  prototype,  and  that  categorization  usuallv  is  assumed  to  Ik-  dclcnmncd  in  terms  ol 

die  relative  match  (or  perceptual  distance)  between  die  incoming  stimulus  and  die  prototype  -V*  suinlartiv  to  tlic  prototv |v  decreases  (i  e  , 

as  perceptual  distance  increases),  tlic  quality  ot  category  membership  decreases,  and  the  probahililv  ot  asMgnment  to  the  given  calegoiy 

should  decrease  furthermore.  as  stimuli  increase  in  similarity  to  (and  thus,  decrease  in  pcrccfMual  distance  tiom)  the  piototvpes  lor  oUici 

iatcg« >ries.  the  prohahihtv  of  assignment  to  other  categories  should  iiKTea.se 

I  xeinplar  llieorv  te  g  Medm  k  Schaller.l97K.  Nosolskv.  1991  j  proposes  that  expet teiues  ale  stored  in  mentor x  and  0 

categorization  is  Jeleniuned  hv  tlic  sel  of  the  exemplars  elicited  h\  the  incoming  stimulus  I  xemplars  for  a  goal  cdlegoiv  are  s|  veil  is 
instances  of  a  stimulus,  rather  than  a  single,  averaged  representation  of  experienced  stimuli  In  modeling  caiegon/aliou.  the  incoming, 
stimuli!'  is  assumed  to  Iv  categorized  according  to  the  degree  ol  sinularitv  to  the  st»*rcd  exemplars  li  similar  its  i  -  high  relative  lo  the  p*»ol 
of  exemplars  lor  a  given  calegorv.  the  incoming  stimulus  will  Iv  assigned  to  that  calegorv  exemplar  Although  it  has  oltcn  I ven  digued  that 
categorization  findings  are  better  dcscrilvd  by  exemplar,  rather  than  prototype  models,  most  such  research  lias  locused  on  limited,  oflen 

artificially -defined  categories  I.»  k  Fastore  { 1992)  note  that  different lating  between  exemplar  and  prolotvpc  models  tor  natural  categories  0 

(e  g  stop  consonants)  can  Iv  very  difficult  (also  sec  Nosofskv.  Falmen,  k  McKinley.  199 t)  furllKTinore.  lor  Inglilv  learned  categones 
(such  as  pilch  categones  for  musicians),  assuming  the  storage  of  nvmoncs  of  all  specific  instances  appears  lo  requite  an  unreasonable 
memorv  load  and  laigthv  search  processes.  and  is  therclore  dillicult  lo  model 

<  hie  reasonable  approaeh  lo  modeling  exemplar  notions  about  natural  or  Inghlv  learned  «.  jicivnc-  would  Ik  !•>  I.kik  on  the 
ccTt.ral  tendencies  in  the  prohabthtv  distribution  of  sampling  excnipl.us  It  is  logical  lo  assume  that  c  dicrotic-  should  have  .i  higher 

k  oik  etiti  at  ion  o|  i‘i  m  td  exemplats  Ihan  | . *  e\  ::ip!.?rv.  Te,,w  . .  p*.*‘  V'-v  -  *1  -sc  gc»««d  exempl  u  •  Ivui!1  eluii*  d  lw  llu  iiKoinme  0 

stunulu*  -It. -ukl  iiKtci  \llh-*u',.h  thi  ddlcieiu  e  Ivlween  exemplat  and  pr"l-,,«|K'  not;,  -iin  »c  ill,.  .i,\,  tii  ti  .ml  |..i  undc't.mdin:' 

.  i\ e:  i:  p(.K.  Ilu  iiiim,  t'lob.il  K-v.i  ait.il*>*  i  in  the  c  uiicnl  nuiuisi  ipt  d-s-*  i*->t  .iTk  :  .1;:  Uiw.,i  m.  lew 


Chord  Discrimination 


2 


evaluating  stimuli  by  sampling  horn  a  |hh»I  ol  stored  exemplars  rather  Ilian  by  comparison  to  a  prototype 

Several  predictions  can  Ik*  made  in  regard  to  the  pattern  ol  discrimination  within  a  category  structured  around  a  prototype 
\  tewed  in  terms  o|  the  auditors  jKtcephon  model  proposed  by  Bra  i  da,  I  ant,  Berliner.  Durlach,  KahinowitA  A:  Parks  { 19X4  j ,  a  prototype 
should  Junction  as  an  interior  anchor  Discrimination  slum  Id  be  enhanced  lor  a  prototype  set  ol  stimuli  (stimuli  near  a  prototyjK)  In 
contrast,  a  nonprototype  (NP)set  should  consist  ol'  stimuli  which,  while  lirom  the  same  natural  category  ,  are  distant  Irom  the  given 
category  prototype  (and  any  alternative  category'  prototypes)  and  are  poor  examples  of  the  category.  Ihese  stimuli  should  have  been 
experienced  less  frequently,  and  thus  should  have  a  sparse  distribution  of  similar  exemplars  winch  are  not  common  representatives  ol  the 
category  Therefore.  there  is  no  reason  to  posit  the  formation  of  anchors  in  a  NP  context  l  bus,  if  anchors  prov  ide  a  basis  tor 
discrimination,  the  absence  ol  an  anchor  in  a  NP  context  should  result  in  poor  discrimination  between  stimuli  m  a  NP  category. 

Following  another  line  of  reasoning,  il  discrimination  is  modeled  on  the  basis  ol  perceptual  distance  Irom  a  standard  (eg  a 
prototype  or  anchor)  and  obeys  any  sort  of  approximation  to  Weber's  law,  stimuli  difloing  by  a  constant  amount  (Ad)  Irom  each  other 
should  be  most  discriminate  if  they  are  similar  to  and  thus,  a  small  distance  from  the  prototype  (e  g  with  a  constant  A  d,  discrimination 
should  be  an  inverse  function  ol  d,  the  distance  from  the  prototype).  Thus,  stimuli  around  a  prototype  agaui  would  be  predicted  to  be  more 
discriminate  than  those  sampled  at  some  distance  from  a  prototype. 

I"he  prototype  (P)  and  nonprototype  (NP)  sets  of  stimuli  in  the  current  study  are  constructed  with  a  number  ol  common  stimuli, 
which  can  be  used  to  evaluate  dillerences  in  the  scale  of  goodness  ratings  lor  the  P  and  NP  sets  of  stimuli  (See  the  circles  in  Fig  l  tor 
examples  ol  common  stimuli  )  Some  general  ideas  Irom  the  Macmillan.  Braida,  &  Goldberg  (19X7)  model  ot  auditory  perception,  based 
upon  trace  and  context  coding  concepts,  would  predict  context  differences  in  the  goodness  ratings  of  stimuli  common  to  die  P  and  NP  sets 
Goodness  raluigs  should  be  based  upon  comparisons  made  with  respect  to  the  entire  range  ol  stimuli  used  in  the  experiment.  Furthermore, 
stimuli  in  a  P  context  probably  should  reflect  considerable  prior  knowledge  and  learn  mg  because  they  include  common,  frequently 
experienced,  representatives  of  tlie  category.  In  contrast,  stimuli  from  a  NP  context  would  not  reflect  prior  knowledge  and  learning,  a>  they 
are  poor  representatives  of  a  category  and  are  not  often  encountered  lliese  experiential  dillerences  should  Ik  reflected  in  dilleienl  ratings 
lor  am  stimuli  which  are  common  to  lx>th  the  P  and  NP  contexts. 

It  is  known  tliat  ratings  can  Ik  allected  by  the  range  and  distribution  ol  stimuli  (Parducu  &  Pen  el  .1971)  Allliougli  the  two 
sets  ol  stimuli  m  the  current  study  are  equal  in  the  si/e  ol  physic  I  range,  tin:  distribution  ot  perceived  qualitv  relative  to  a  prototype 
anclior  sfrnild  not  Ik  equal  lliese  dillerences  in  qualitative  distributions  should  result  in  the  differential  use  ol  rating  scales  according  to 
the  coutexl  which  is  being  judged,  reflected  in  the  dillcrent  ratings  of  stimuli  common  to  the  P  and  \P  contexts  In  (larticular.  a  stimulus 
rated  tJalivelv  low  ui  a  P  context  should  he  rated  higlter  il  presented  in  a  NP  context,  where  it  is  relatively  IkUci  than  the  other  mcmlvrs  ol 
the  NP  stimulus  set  In  summary,  predictions  lor  goodness  ratings  m  a  category  anticipate  different  ratmgs  according  to  context,  and 
several  classification  models  predict  increased  discrimination  in  a  prototype  context 
I  he  "iVncptual  Magnel'  J  ttcci 

t.  onsidering  these  predictions,  results  from  recent  sfKech  research  (kuhl.  199 1 )  are  somewhat  unexpected  In  examining  the 
internal  structure  ol  a  vowel  category  for  stimuli  consisting  ol  a  conlinuurn  ol  i  vowels  varying  in  1-2  and  IT  kuhl  ( 1991 )  demonstiated  a 
“perceptual  magnet"  el  led  kuhl  constructed  two  dillcrent  sets  ol  i  stimuli,  with  tlie  set  ot  stimuli  in  the  (P)  context  iKiiig  more 
representative  ■>!  the  i  category  than  the  set  ol  stimuli  m  tlie  (NP)  context  Subjects  rated  the  stimuli  m  cadi  context  for  goodness  as  ah  i 
vowel,  tru  hiding  a  sampling  ol  stimuli  common  to  both  types  ol  contexts  The  theoretical  prototype,  based  upon  prior  research  ( Peterson  Ac 
Bamev.  19*2)  was  the  highest  rated  stimulus  across  froth  tlie  P  and  NP  contexts,  and  was  used  as  the  standard  loi  a  sulisequcnl 
discrimination  ta4  m  a  piototvpc context  Hie  discrimination  standard  lor  the  NP  context  was  a  low  rated  stimulus  I hscrimi  nation  <*i 
other  stimuli  Irom  the  given  context  was  measured  as  a  Junction  ol  distance  in  jKieeptual  space  (defined  in  niel  umJsl  tiom  the  Mand.ud 
I  o»  equal  di-tutK.es  m  i»k*1  units,  discrimination  was  found  to  he  lowei  lor  die  protol\|K  relative  to  the  notipioiot\j>c  standard  i  hew 
results  weir  mierj>rele«l  as  ictlcctmg  reduced  disenmmalton  in  the  region  ol  the  prototype.  with  the  pioiotxj*c  acting  a-  a  ty|v  >-i  jx-Kcptual 
magnet  lliai  ledines  the  perceived  distance  ol  nctghlxinug  stimuli 

Neillier  consistent  goodness  ratings  of  common  sliiiiuli  acrosscontcxt  nor  reduced  diwtimiiutioh  nc.u  a  prototype  arc  predicted 
from  various  categorization  niodels  (Braida  el  al  .19X4,  Pardueci  Ai  Perrel.  1971 )  or  psychophysical  laws  (c  g  \K  cIkt's  lavs )  sumnuiwcJ 
al*ove  ;Vs  a  result,  several  attempt*  have  been  made  to  replicate  tlie  perceptual  magnet  ellect  lor  vowels  1  ivclx  ( 1 99 1)  lound  that 
poodness  ratines  ditlercd  across  l>oth  subjects  and  context  Additionallv .  in  contrast  to  a  magnet  ellect.  I  ivelv  lound  slight  I  v  heightened 
discrimination  around  an  "average”  prototype  standard  When  individual  prototypes  were  u.kJ  as  a  standard,  discrimination  was  equal 
across  contexts  I  auckncf -Morann  &  Sussntan  ( 1991)  parlialiv  replicated  the  kuhl  studx.  but  questioned  lalvlmg  the  icsults  as  a  sjwial 
ellect  an  additional  identification  task  proxtded  some  evidence  tliat  kuhl's  uouprololvjK  stimuli  could  have  n\v hided  two  perceptual 
. at.  ■••■•lies  with  improved  disenmmalton  due  to  tlie  measurement  o|  iKtween.  not  within  category  doer lminahon 

I  •  •*!.  \ mr  the  I’enc* al  logic  and  pns.edu res  ot  the  previous  s|Ke».h  studies  the  v uiiv-nl  Muds  alK  mpi-  c\  ulu.iic  qu.ili>.iii\ c 
■:  *  i . :  i  ‘  b  .  ri-mn  iti.*ii  in  a  rii*nsjKVch  »  ulei'otv  Pri<*i  rescan  li  has  dcin.  'list  rated  lllul  a  iiuuiK-;  .  <1  -pec,  li  pheu- .uien.i  siulia 
!  •  •  ;  1  i:  «  ihilli-.  ,‘v  u  lid  |o  I  4Hkc  A»  kcll.ir  l'»  ' »  Xk  •  >1  Xu  :-u!  •  J  T  r:.!  '  p. : .  ;  •;  •:  M  tU  A.  p  i-l  •.  !  • 

I  .  ►  '  f  •»  .  lib.. Ill  «x  X/.  /,  -tul  |’/X  i  i  ui..*  t  \!  1  t-  >1  l:i  :  I.  d  'iIIImI:  •  I  .  II-  -•  ;  IT'.  .  .  .  '  ts  Hsj!  t-  •*••  .  1  • 

'  ;  1  '  ■’  "  *■  V  . . . 


» 


» 


» 


» 


» 


( '(lord  Discrimination 


3 


unique  to  speech,  file  results  of  the  current  study  therefore  are  im|)ortaii(  in  evaluating  the  general  nature  of  categorization  tor  a  natural 
category,  and  tn  serving  as  a  comparison  lor  tiic  speech  work. 

General  Method 


Suhjeets 

An  assumption  of  prototype  (and  exemplar )  theory  is  Ural  experience  is  needed  for  die  I  on  nation  of  good  category 
representations  (or  a  reasonable  pool  of  exemplars)  If  knowledge  of  a  particular  category  is  minimal,  experience  with  Uie  category  is 
probably  insufficient  to  have  formed  a  prototype.  Therefore,  we  used  5  musically  trained  subjects  who  had  a  minimum  of  10  years  ol 
expenence,  with  2  subjects  having  college-level  training  Subjects  were  paid  $5/hour  for  their  participation 

Because  the  concepts  of  prototypes  and  exemplars  anticipate  individual  ditlerences.  our  musical  rating  task  maintained  a  within 
subjects  design,  with  each  subject  providing  separate  ratings  for  P  and  NP  sets  of  stimuli  (Experiment  I ),  and  then  perldmiing  a 
discrimination  task  (Experiment  2).  By  using  a  within  subjects  design,  the  current  study  avoids  potential  individual  diflcrence  problems 
present  in  a  between  subjects  design,  previously  used  in  some  speech  research  (Kulil,  1991)  l  or  example,  1 1  ratings  were  averaged  across 
subjects  in  a  rating  task  to  determine  a  prototype  standard,  the  averaged  prototype  standard  may  not  be  representative  of  the  prototype  of 
an  individual  subject,  and  the  subsequent  discrimination  results  may  not  accurately  reflect  ihe  individuals  category  structure  relative  to  this 
prototype  In  contrast,  the  use  of  individual  prototype  standards  for  a  discrimination  task  should  more  accural  ly  reflect  individual 
category  structure. 

Stimuli 

Two  sets  of  stimuli  were  constructed  by  generating  individual  sine  tones  (I  2  bii,  l(J  kHz  sample  rate)  and  digitally  mixing  them 
to  form  triads  based  on  a  root  position  C  major  triad  (see  Figure  I ).  The  initial  (theoretical)  prototype  stimulus  was  a  perfectly  tuned  C 
major  triad  (C  =  262  lb,  F.  =  330  Hz,  G  =  392  lb).  The  oilier  stimuli  were  generated  by  holding  the  C  constant  and  varying  the  t  and  (i 
frequencies  in  both  sharp  and  flat  directions  in  2  lb  increments  For  8  of  the  30  stimuli,  only  Uie  I.  frequency  varied,  for  another  8  stimuli, 
only  Uie  G  frequency  varied  Positive  and  negative  diagonals  were  created  by  varying  the  K  and  G  frequencies  simultaneously  with  equal 
steps  in  either  the  same  (both  flat  or  sharp)  or  opposite  (one  flat,  the  other  sharp)  directions  Thus,  the  prototype  stimuli  ranged  from  322 
lb  to  338  lb  for  the  E  and  384  Hz  to  400  lb  for  the  G  Nonprototype  stimuli,  which  were  created  in  a  similar  manner,  were  based  on  a 
mistuned  C  major  triad  (C  -262  lb,  K-338  lb,  G  384  Hz)  and  ranged  I  hint  330  Hz  to  346  lb  for  Uie  I.,  and  376  liz  to  392  lb.  tor  the 
G  l  sing  these  procedures,  30  stimuli  differing  in  equal  steps  were  created  for  Uie  prototype  and  the  notiprototype  space,  as  summarized  in 
Figure  1  Of  the  30  stimuli  in  each  set,  7  occurred  in  both  sets  (see  circles  in  Fig  l )  The  sampling  of  stimuli  generally  followed  dial  used 
by  kulil  ( 19*7 1 )  for  speech  stimuli 

Because  of  the  limited  frequency  range,  equal  changes  in  frequency  provided  a  close  approximation  to  equal  .nanges  tn 
psychophysical  distance.  Therefore,  the  cents  scale  was  not  needed  to  equate  perceptual  distance.  All  stimuli  were  1000  ms  in  duration, 
were  low -pass  filtered  at  4k  Hz,  and  presented  over  TDH-49  earphones  at  7x  dB(A  )  m  commercial  sound  cliamlvrs 


insert  figure  I  here 
Experiment  I 

Ihe  goal  of  E  xpenmenl  I  was  to  use  goodness  ratings  to  evaluate  qualitative  grading  in  a  musical  category  In  addition,  each 
individual's  highest  rated  stimulus  was  used  to  identity  llieir  prototype  standard  lor  a  sulisequenl  discnnunation  task  (Experiment  2) 
l*r<  xedure 

Subjects  were  instructed  to  rate  llie  goodness  ol  each  slimulus  as  a  major  chord  on  a  scale  ol  I  (vers  |hk>i  )  to  7  (very  g*n»d) 
Kalinin  were  indicated  by  button  presses  on  a  telephone  keypad  and  were  collected  by  computer  I  o  become  t.uniliar  wiUj  the  stimulus 
range,  subjects  listened  to  Uie  30  stimuli  from  a  given  context  once  in  random  order  without  responding  All  sliinuh  then  were  presented 
three  more  times  to  provide  practice  using  the  scale  Data  finally  were  collided  I  tom  20  landomi/ed  repetitions  ol  the  stimulus  set  Ihe 
,*r*Kedure  then  was  repeated  tor  the  other  context,  wiUi  context  or-ler  counterbalanced  across  suhiecls 
Results  and  Disuissj».*ti 

t  ioodness  ratings  for  Ihe  musical  task  (a)  declined  sysl  email  calls  from  Uie  mdiv  iduallv  delined  (i  e  highest  rated  stimulus) 
prototype  and  (b)  improved  wfien  moving  from  the  nonprototype  center  stimulus  toward  the  prolotypc  [tor  F  and  NT  contexts  respectively, 
F(4.K>)  38  38.  p-  0001.  F( 4,16)  49  72.  p  •  .0001 ,  see  Figure  2.  panels  A  and  H]  As  I  able  1  shows.  Uie  highest  rated  stimulus 

varied  across.  subjects,  but  always  occurred  wiUun  1  or  2  city  block  steps  (2  or  4  lb)  from  the  perfectly  tuned  triad  Ratings  between  the 
individual  ami  theoretical  prototype  are  significantly  different  |l  (  1.4)  22  17.  p  •  0 1  j  Hie  absence  ol  perfect  tuning  in  each  individual's 

proMvjv  was  not  completely  surpri? 'Mg.  as  several  studies  have  shown  that  musical  mlciv  jls  are  olleu  comptessed  t  Rakowski.  197r>)  oi 
slide hed  i  llarttiMiiii.  i99t.  Ward.  I9M)  Sjk*ccIi  research  ( I  i\el\.  lv»M » al-,.*  louud  lalim’s  to  d,1lei  .utoss  subjects 

I  he  rating  of  the  llicotelual  prototype  also  signilicauiK  dillcted  .uw.x  ,..nu  vu  |i  <  I  t>  '»j  p  <m'|  hi  the 

!»•  t-  .1%  jv  c>  -nte  *t  the  lad  that  the  pctfci  1 1  x  tuned  Iliad  ie»  eived  the  ln»*l..  a\ci.i-s  in.  ■  i  Urn**  t  .  ■Miyislenl  with  il  ».  leal  «\  Iviih* 

O'  ■  .  ■  , 1,1.1-.  a  .  .nl.ibh-  I  Ik  l.ilings  ol  the  olhei  ^  sliaic'il  slnnnli  als-< , i'k.  ihll.  i.  .1  i, .  •  ■  la.  ;  .  m.-xt  w  illi  all  shared  stimuli 

■  ..in.  I  *1  |»t-  ■  til  ilk  \l*  i  |  I  •  I  »■  lll'V  |  ,  M  •■■■■•  «.  ■■■rplo|..1v,v  V,|  VC  I 


•  I 


Chord  Discrimination 


also  significantly  different  aero-  ;  contexts  |l(  1.4)  13  OX,  p  •  (K)5|.  hut  overall,  received  lower  ratings  than  the  theoretical  prototype 

Recent  speech  research  (Lively,  199.1)  also  found  that  vowel  gixxhicss  ratings  ol  shared  stimuli  dillered  across  context 

Insert  Table  I  here 


Because  subjects  differed  in  the  average  and  range  of  rating  employed  in  Uic  P  and  NP  contexts  (reflecting  differences  in  die  use 
of  the  scale),  it  was  difficult  to  legitimately  either  pool  results  across  subjects  or  to  compare  results  across  conditions  (P  or  NP  contexts).  In 
order  to  better  equate  the  ratings  across  subjects,  individual  ratings  lor  the  P  context  were  normalized  to  yield  a  mean  rating  of  3  5  (center 
of  rating  range)  and  a  standard  deviation  of  1 .8  (selected  to  keep  all  ratings  between  1  and  7).  These  normalized  data  then  were  averaged 
to  yield  the  rating  results  shown  in  Ftgure  2  The  individual  data  for  the  NP  context  also  were  normalized  for  a  mean  of  2.03,  thus 
matching  the  rating  of  the  theoretical  prototype  across  the  two  contexts)  and  a  standard  i  aviation  ol  i  (the  original  SL)  average  across 
all  subjects  in  the  NP  context).  \k  hile  the  normalized  results  do  not  accurately  reflect  obtained  rating  diffei  cnees  between  the  theoretical 
prototype  and  nonprototype  standards,  they  do  relied  the  relative  rating  tendencies  both  within  and  across  contex’s,  with  stimuli  in  the  NP 
context  being  rated  much  lower  than  those  in  the  P  context. 


Summary  Individual  differences  in  optimum  stimulus  ratings  were  expeded  and  can  probably  be  attributed  to  expe* ...  .Lial 
factors,  such  as  differences  in  the  major  instrument  of  study  for  the  5  subjects.  In  addition.,  most  individual  prototypes  were  slightly  Hat 
compared  to  the  theoretical  prototype  This  slightly  flat  mistuning  is  consistent  with  the  tradition  of  equal  lemperment.  which  slightly 
compresses  major  thirds  and  slightly  stretches  major  fifths  Rating  differences  for  common  stimuli  across  contexts  are  consistent  with 
predictions  based  on  the  range  and  qualitative  distribution  of  stimuli 

Experiment  2 

Experiment  1  demonstrated  that  musical  categories  are  clearly  qualitatively  graded,  with  one  stimulus  always  receiving  a 
distinctive! v  higfier  rating  than  the  oth«*^  category  members.  (The  specific  stimulus  optimum  stimulus  may  vary  by  mdiv idual  ) 

Eurthemi  're.  the  goodness  ratings  of  stun  .i  decrease  in  a  relatively  systematic  fashion  as  a  fund  ion  of  distance  from  the  prototype  It  can 
Ik*  inferred  from  these  findings  that  musical  categories  may  be  structured  around  prototypes  or  frequent  exemplar,  (i  e  .  the  highest  rated 
stimulus)  I  here  fore,  a  discrimination  task  similar  to  tiial  used  in  several  speech  studies  (kuhl,  1991,  Lively.  1993)  should  lx:  able  to 
effectively  evaluate  the  function  of  a  prototype  in  this  natural,  nonspecch  category  Classification  models  (Braida  el  al .  1984)  and  WcIkt’s 
law  predtd  that  discrimination  should  lie  higher  in  a  prototype  context  than  in  a  nonprototype  context 
Procedure 

Because  the  musical  rating  task  revealed  individual  differences.  we  us'd  a  with  in -subjects  design  lor  a  discrimination  task,  with 
each  subject's  highest  rated  stimulus  from  the  goodness  rating  task  employed  as  that  subject's  prototype  standard  Additional  support  lor 
using  a  within  subjects  design  comes  from  I  ively  ( 1 993).  where  the  pattern  of  discrimination  differed  slightly .  depending  on  whether  an 
averaged  prototype  or  the  individual's  prototype  was  used  as  a  standard  Since  lor  most  of  our  subjects,  the  prototype  was  within  l  city 
block  step  (2  Hz)  away  from  the  perfectly  tuned  tnad  (which  was  the  center  of  the  P  stimulus  set),  the  nonprototype  standard  was  selected 
to  he  one  step  from  the  center  of  the  NP  stimulus  set  l  Inlike  llie  prototype  standard,  the  nonprototype  standard  has  no  special  significance 
oilier  than  representing  a  control  condition,  so  there  was  no  need  to  modify  the  selected  standard  lor  indiv  idual  subjects  I  herelorc,  a  single 
nonprofit \pe  slanduid  w  as  used  f«*r  all  subjects 

Because  kuhl  (1991 )  wanted  to  use  a  single  procedure  for  human  mfanLs  and  adults,  as  well  as  monkeys,  she  used  a  go.  no-go 
procedure  'However  this  procedure  is  higlilv  sensitive  to  response  bias  A  go.  no-go  task  is  essentially  equivalent  to  an  ABN  task,  and  a 
recent  studv  ol  \H\  discrimination  o|  stimuli  drawn  from  a  -ontinuum  between  major  and  its  related  minor  Ina  ls  (Howard.  Rosen.  & 
Broad.  1992)  demonstrated  significant  buis  llierelore.  we  useu  21 K’  task,  which  sh*  ild  provide  a  more  icliahle  discrimination  n  -me 
for  eac’’  stimulus  (28  repetitions  for  each  stimulus  vs  2  in  llie  Kuhl  (199 1 )  discrimination  task)  a  d  which  is  bias-free 

Subjects  heard  two  pairs  of  triads  per  trial,  consisting  of  a  same  and  a  dillercnl  pair.  In  each  pan.  the  standard  always  was 
presented  first  In  "same'’  pairs  the  standard  (P  or  NP)  was  repealed  The  second  stimulus  in  the  "different'  pair  was  randomly  selected 
from  the  29  oilier  triads  The  ordering  of  pairs  within  a  tnal  was  randomized  Subjects  indicated  with  a  button  press  which  pan  contained 
tlie  different  triads  Hie  experiment  was  run  in  two  sessions,  with  each  session  containing  both  the  P  and  NP  contexts  Selection  ol  initial 
context  (P  or  NP)  was  counterbalanced  across  subjects,  and  the  context  initially  presented  in  tlie  first  session  was  presented  last  m  ilv 
second  session  Subjects  weie  presented  each  triad  a  lot.il  ol  28  times 


.ills  t«»;  Nab  » oiiiexts  aie  shown  m  I  irurc  t  \s  |*tcdicted  l»v  1t.idiiion.il  no! 

••I iii  hi  the  f  context  than  in  the  NP  context  \wi.i;',  .li niiiin.ition  it . 

cc.i  '•  ' o  }kf;  fl'-«c  uf  II'  i  ['■»»!»»«■  •  sfj.iip  it  hits  lloil  and  ai  ' 


i  ••  I  .«'  rn.ti  -  i-ti.  t!  .1 


Chord  Discrimination 


5 


similar  trend.  with  performance  jumping  to  near  ported  performance  for  a  diUcrence  ol  two  city  block  slops  (4  11/  change)  Item  llie 
standard.  In  tael,  one  subject  demonstrated  virtually  perfect  discrimination  hi  die  P.  but  not  NP  context  .-Ml  ol  these  findings  are  consistent 
with  the  predictions  trom  categon/alion  and  perceptual  models. 

Insert  Figure  3  here 

General  Discussion 

The  current  study  investigated  aspects  of  perception  for  stimuli  varying  systematically  in  llie  frequency  of  two  steady-state 
components.  Cioodness  ratuig>  revealed  tiial  tiie  theoretical  musical  prototype  (llie  perfectly  tuned  triad)  was  never  rated  llie  lughest  in  die 
P  context  and  that  the  optimum  stimulus  varied  across  subjects,  but  that  ail  stimuli  common  to  both  sets  (P  and  NP)  were  rated  higher  in 
die  NP  context  As  discussed  in  the  introduction,  context  effects  such  as  these  are  axis i stent  with  at  least  two  similar  concepts.  First, 
ratings  have  been  shown  to  be  affected  by  the  range  and  distribution  of  stimuli  While  die  music  stimuli  contexts  (P  or  NP)  were  of  equal 
physical  range,  qualitative  perceptual  range  was  not  equal.  Therefore,  different  ratings  for  the  same  stimuli  embedded  in  dillerenl  contexts 
are  expected  Similarly,  different  ratings  for  common  stimuli  across  different  contexts  (P  or  NP)  are  consistent  with  context  coding 
concepts,  where  comparisons  of  stimuli  are  made  with  respect  to  the  entire  range  of  stimuli  used  in  die  experiment,  and  thus  should  be  a 
function  of  stimulus  set 

Discrimination  for  equal  distances  from  a  standard  in  a  musical  category  was  better  in  a  prototype  context  titan  m  a 
nonprolot vpe  context  While  tn  contrast  to  predictions  based  on  a  perceptual  magnet,  these  discrimination  findings  from  a  natural  category 
are  consistent  with  typical  psychophysical  laws  and  confirm  discrimination  is  probably  achieved  with  the  use  of  some  form  of  anchors  or 
reference  points  (Braida  et  al.,  19X4;  Macmillan.  Braida,  &  Goldberg.  1987) 

Comparison  to  Recent  Speech  Studies 

Recent  speech  research  investigating  qualitative  grading  in  a  vowel  category  (kuhl,  1993,  Uvcly.  1993)  has  shown  ratings  of 
optimum  stimuli  to  differ  across  individuals,  The  current  findings  for  a  musical  category,  along  with  these  recent  speech  results,  indicate 
thal  some  individual  diflerenees  are  probably  typical  of  most  general  perceptual  processes,  and  future  research  should  investigate  the  small 
discrepancies  between  true  individual  prototv  pcs  and  theoretical  prototypes. 

I  jvely  (1993)  found  goodness  ratings  of  vowel  stimuli  common  to  both  contexts  (P  and  NP)  to  be  all  rated  higher  in  the  NP 
context  than  the  P  context  In  contrast,  the  original  vowel  rating  study  (kuhl,  1991)  did  not  demonstrate  clear  rating  diflerenees  across 
contexts,  and  the  same  stimulus  received  the  highest  rating  in  both  contexts  Variable  ratings  across  contexts  are  consistent  with  predictions 
based  on  the  range  and  qualitative  distribution  of  stimuli,  while  consistent  ratings  across  contexts  are  not 

The  differences  across  the  speech  and  m.nspeech  studies  found  in  the  goodness  ratings  may  be  a  function  of  stimulus  selection 
While  the  present  study  shows  a  consistent  pattern  of  ratings  for  stimuli  common  to  both  sets,  it  cannot  directly  evaluate  the  stability  ol 
ratings  for  the  stimuli  which  served  as  the  actual  prototype  stimulus  lor  the  various  subjects  Due  to  the  construction  ol  the  prototype 
stimulus  set  based  upon  the  individual  prototypes,  which  were  never  the  theoretical  prototype,  the  actual  individual  prototype  was  never 
present  in  the  NP  context  11  each  individual's  prut  4ypc  had  been  present  in  the  NP  context,  it  is  |x>ssiblc  that  tins  stimulus  would  have 
functioned  as  a  perceptual  anchor,  thus  removing  range  effects 

Discrimination  results  Ironi  the  current  study  dtlfer  from  the  summarized  speech  findings  kuhl  (1991)  found  belter 
discrimination  in  a  NP  context  than  a  P  context,  and  interpreted  these  results  as  demonstrating  reduced  performance  tn  the  P  context,  with 
the  prototype  excerting  a  magnet  effect  on  surrounding  stimuli,  as  opposed  to  enhanced  performance  in  a  NP  context  While  the  kuhl 
conclusion  is  not  consistent  with  either  a  WcIxt  tuncimn  or  anchor  and  reference  point  not  urns,  it  seems  to  be  based  upon  the  logical 
assumption  dial,  other  factors  being  equal,  performance  dilleretices  should  lx’  attnlnitcd  to  the  action  of  a  pr otolype.  rather  than  some 
unknown  tacloi  enhancing  perfonnancc  in  the  NP  context  This  logical  reasoning  leads  to  tlx*  conjectured  "perceptual  magnet"  metaphor 
In  an  atiempled  replication.  Lively  (199 1  >  found  slightly  better  discrimination  in  a  P  context  with  the  prototype  standard  defined  by  kuhl. 
and  when  null  liiai  prototypes  were  used  as  a  -andard,  discrimination  was  virtually  equal  across  contexts  Neither  disenminalion  results 
are  coiiMStent  with  classification  and  perceptual  models,  which  predict  heightened  discrimination  in  a  P  context,  as  found  in  the  current 
study 

W  hy  do  the  music  results  dtfler  from  inc  original  speech  results  in  terms  ot  a  perceptual  magnet  effect  for  the  prototype  ’  One 
possibility  is  that  music  and  speech  categon/a<ion  processes  are  qualitatively  different  Alternatively,  it  max  Ik*  possible  that  kuhl  ( 1991) 
had  actually  found  enhanced  discrimination  m  a  NP  context,  not  reduced  discrimination  in  a  1*  context  As  noted  aU»ve.  I  jucknet-Morano 
&  Sussman  ( I9*n,  used  a  labeling  task  to  demonstrate  that  kohl's  original  speech  nonprototype  context  ma\  have  contained  two 
categories  If  this  indication  is  an  accurate  assessment  of  llie  original  NP  context  vowel  stimuli,  then  kuhl  \  oiiginal  findings  max  not  lx* 
indicative  <<l  a  magnet  of  led.  but  ralhet  ol  artifici.illv  enhanced  disenminalion  m  the  nonprototype  context  due  to  Ivlwcen  taicgorv 
«.omp. moolis  I  tin  .  addiimii.il  speedt  i csc.tr dt  is  neccssaiv  to  sludv  the  naitire  «*f  prototv pe  and  iii>uptoloiv|v  tlisviimmaiion  under 
a. ,  ei*l.ilih  i*i  j  i  ii*.  .den!  «.  ■  •ndili*  >iis  I  lie  pt»sv  nl  icMills  ft  mil  aiiothei  tialui.il  i.ilcgotv  while  I  »•  »t  .iddicssme  the  '  alidil .  ,>|  |lu  speech 
1 1  •  i  - 1  i  r  •  .  !'t . 1  .  1-  t  s  dn  ile  .  .He -is  «i  \  -It  ii.  tine  1**i  miisi,  a  I  Iliads  and  tan  afs"  serve  .i*-  a  <.oinp.il  is.«n  |.  •  •■I'me  qsedi  hiulm-  - 


Chord  Discrimination 


6 


References 

Braida.L.D ,  Durlach,  N  1.,  I  im.  J.S ,  Berliner.  J  K..  Rabmowilz,  W.M.,  A  Purks,  S.R.  (19X4)  Intensity  Perception.  XIII  Perceptual 
anchor  model  of  context  coding  Journal  of  the  Acoustical  Society  of  America.  76.  722-7 3 1 

Bums,  KM,  &  Ward,  W.I.  ( I97X).  Categorical  Perception  -  phenomenon  or  epiphenomenon  Evidence  from  experiments  in  the  perception 
of  melodic  musical  intervals.  Journal  of  the  Acoustical  Society  of America.  63,  456-468. 

Grieser,  D  L  &  Kuhl,  P  K  (1989)  Categorization  of  speech  by  infants:  Support  for  speech-sound  prototypes  Developmental 
Psychology.  25.  577-588 

Hall,  M.D.  A  Pastore,  R.E.  (1992).  Musical  duplex  perception  :  Perception  of  figurally  good  chords  with  subliminal 

distinguishing  tones  Journal  of  Experimental  Psychology:  Human  Perception  and  Performance.  18,  752-762 

Hartmann,  W.M  (1993).  On  the  origin  of  the  enlarged  melodic  octave.  Journal  of the  Acoustical  Society  of America,  93,  3400- 
3409 

Kuhl.  P.K.  (1991).  Human  adults  and  human  infants  show  a  "perceptual  magnet  effect"  for  the  prototype  of  speech  categories,  monkeys 

do  not.  Perception  and  Psychophysics.  50.  93-107. 

Davis,  K.  &  Kuhl,  P  (1993,  May).  Acoustic  correlates  of  phonetic  prototypes  Velar  slops.  Poster  presented  at  the  1 25th  meeting  of  the 
Acoustical  Society  of  America.  Ottawa,  Canada. 

I  .auckner-Morano ,  V.,  A  Sussman,  J.E.  (1993,  May).  Identification  and  change/no-change  discrimination  of  hi  stimuli:  Further 

tests  of  the  "magnet"  effect  Paper  presented  at  the  1 25th  meeting  of  the  Acoustical  Society  of  America  Ottawa,  Canada 

!.i,  X  A  Pastore,  R  E.  (1992).  Evaluation  of  prototypes  in  perceptual  space  for  a  place  contrast  In  M.E.H.  Schoulen  (Ed.)  I  he 
Auditory  Processing  of  Speech  New  York:  de  Gniyter,  303-308 

lively,  S.K.  (1993,  Mav).  An  examination  of  the  perceptual  magnet  effect.  Paper  presented  at  the  125th  meeting  of  the  Acoustical 
Society  of  America.  Ottawa.  Canada 

l-ocke,  S  ,  A  Kellar,  I,.  (1973)  Categorical  perception  in  a  nonlinguistic  mode  Cortex.  9.  355-369 


Macmillan.  N  A,  Braida,  L.D  .  Goldlrerg.  R  I  (1987).  Central  and  peripheral  processing  in  the  perception  of  speech  and  nonspcech 
sounds  In  M  K.  H  Schoulen  (Ed).  7 he  Psychophysics  oj  Speech  Perception.  Dordecht,  The  Netherlands  Martinus 
Nijhoft  Publishers 

Medin,  DL  A  Schaffer,  MM  (1978)  Context  theory  of  classification  learning  Psychological  Review.  97,  225-252. 

Nosofskv.  R.M  (1991)  Tests  of  an  exemplar  model  for  relating  perceptual  classification  and  recognition  memory  Journal  oj 
Experimental  Psychology'  Human  Perception  and  Performance  17.  3-27. 

Nosofskv,  R.M  .  Palrnen,  TP,  McKinley.  S  (1993.  November)  Rulc-Plus-Exception  model  of  classification  learning.  Paper 
presented  at  the  .14Ui  meeting  of  the  Psvchoiiomic  Society,  Waslungton  D  C 

Parducci.  A.  A  Perrct,  I .  F  (1971)  Catcgon  rating  scales:  Effects  of  relative  spacing  and  frequency  Journal  of  Experimental 
Psychology  Monographs.  89.  427-452 

Pastore.  R  1  .  SchmucKIcr.  M  A  .  Rosenhlum.  I  .  A  S/c/esiul.  R  ( 19X3)  Duplex  perception  with  musical  stimuli  Perception 
and  Psychophysics.  33.  469-474 

Peterson.  0  I  .  A  Bamey.  H  I.  ( 1952)  Control  methods  used  in  a  study  ol  llie  vowels  Journal  of  the  Acoustical  Society  of  Ament,  a. 
24  175-184 

Posner.  M  1  ,  A  Keek*.  S  W  (1968)  On  the  genesis  ol  abstract  ideas  Journal  of  Experimental  Psychology.  77.  353-363 

RaRowski.  A  ( 1976)  Tuning  of  isolated  musical  intervals.  Journal  of  the  Acoustical  Society  of  America,  59.  S50  (A) 

Samuel.  A  ( i  (1982)  Phonetic  prototypes  Perception  and  Psychophysics.  3 1 .  307-3 1 4 

S ieg.il..  J  A  .  A  S legal,  W  (1977)  Categorical  perception  of  tonal  intervals  Musicians  can't  tell  sharp  from  flat  Perception  ami 
Psychophysics.  21.  399-407 

Sluddert  Kcnnedx.  M  Kiborm.ni.  A  M  .  H.irro  kS.^  Cooper.  I  S  ( 1970)  Motor  theory  ol  sjvcv.li  perception  A  rcpK  tv'  I  .nw'v 
critical  review  l‘s\ch>>l->*\i,  a!  •?.  •  «  »•  n~‘.  2M-24** 

\ . .  >|.n  i  I  I  A  Miller  I  I  <|9'0j  Phvnet.  Inllucikc  ■  »!  |  Lu,.  •'!  artMilalion  and  speaking  rale  on  the  mtern.il  '•tin.  lou  ol 

■>1,111!'  v.  .itej'oi  iev  92  2  3  7  3’* 


,  '  i  >■  t-  Sul*; 


» 


too  t s ■  • 


7 


Table  1 

Individual  and  Theoretical  Prototype  Ratings  in  a  Prototype  Context 


Individual  Prototype  Stimulus 

Subject  1 

E  Frequency 

330  Hz 

G  Frequency 

396  Hz 

rating 

5.3 

Subject  2 

Subject  3 

Subject  4 

Subject  5 

328  Hz 

330  Hz 

328  Hz 

328  Hz 

390  Hz 

394  Hz 

392  Hz 

394  Hz 

5.9 

6 

4.9 

5.9 

Theoretical  Prototype 

(E  =  330  Hz,  G  =  392  Hz) 


4.3 


5.6 


5.1 


4.1 


4.5 


Chord  Discrimination 


8 


/Vkiiou  lodgements 

This  research  was  supported  by  grant  F4962093 10033  and  F490693 10327  from  the  .Air  Force  Office  of  Scientific  Research. 
The  opinions  expressed  are  those  of  the  authors  and  do  not  necessarily  represent  those  of  the  granting  agency. 


Figure  1  Perceptual  space  for  C-major  triads. 


Figure  Captions 


F  igure  2  Normalized  goodness  rating  summary  for  the  prototype  and  nonprolotype  contexts.  This  figure  parallels  the  structure  of  figure 
1 .  and  the  shared  stimuli  are  equivalent  to  the  stimuli  represented  by  circles  in  figure  1  Bars  representing  a  rating  of  1  do  not  appear 

Figure  3  Discrimination  as  a  function  of  distance  from  a  standard  l  because  individual  prototypes  were  used,  the  same  stimuli  do  not 
necessarily  represent  the  same  distances  from  individual  prototypes  Therefore,  some  of  the  data  points  for  the  prototype  standard  may  not 
iik lude  all  live  subjects 


t 


Perceptual  Space  for  C-major  Chords 


prototype  space 


Prototype  Bating  Summary 


,P 


zh  g 


Normalization  of  Musical  Instrument  Timbre 


Jennifer  L.  Cho.  Michael  D.  Hall,  &  Richard  E.  Pastorc 
Center  for  Cognitive  &  Psycholinguists  Sciences 

State  University  of  New  York  ® 

Binghamton,  NY  13902-6000 

Abstract 

The  perceptual  system  appears  to  engage  in  an  active,  time-consuming  process  called  normalization  that  maintains 
perceptual  constancy  by  adjusting  for  source  differences.  Experiment  1  attempted  to  demonstrate  music  normalization  for  task- 
irrelevant  timbre  variability  in  a  chord  discrimination  task.  Experiment  2  demonstrated  that  music  normalization  is  an  active 

process.  Experiment  3  identified  important  global  components  of  instrument  timbre  which  should  be  differentially  subject  to  9 

normalization.  Based  upon  assumptions  about  the  nature  of  speaker  normalization,  it  was  expected  that  perceived  timbral 

similarity  would  produce  faster  response  times,  while  dissimilar  timbres  would  result  in  longer  response  times.  A  similarity 

scaling  procedure  was  used  to  assess  contributions  of  temporal  (or  attack)  and  spectral  (upper  harmonics)  components  to  timbre 

for  intact  and  physically-altered  natural  stimuli.  Results  indicate  that,  for  the  relatively  long  stimuli  used  in  the  current  study. 

timbre  was  based  primarily  on  the  nature  of  the  upper  harmonics,  with  little  contribution  from  attack  functions.  The  relevance 

of  these  stimulus  properties  to  normalization  was  evaluated  in  Experiment  4  using  an  AX  chord  discrimination  task  for  a  selected 

subset  of  Experiment  3  stimuli.  Normalization,  as  indicated  by  RT,  was  inversely  related  to  similarity.  Information  present  in  Q 

the  higher  harmonics  also  appeared  to  be  most  relevant  to  normalization. 


Normalization  is  a  type  of  perceptual  constancy  that  can  be  loosely  defined  as  the  process  by  which  the  perceptual 
system  adjusts  for  differences  between  sources  in  order  to  preserve  an  intended  perceptual  message.  Normalization  is  an 
important  concept  in  the  speech  perception  literature  where  variability  in  vocal  apparatus,  context,  and  articulation  are  among  the 
many  relevant  factors  that  determine  the  unique  characteristics  of  words  spoken  by  different  talkers  (e.g.,  Johnson.  1988;  Jusczyk. 
Pisoni,  &  Mullcnnix,  1989;  Mullennix  &  Pisoni,  1989;  Mullennix,  Pisoni,  &  Martin,  1989;  Summerfield  &  Haggard,  1975). 

Despite  extensive  variability  among  speakers,  listeners  typically  readily  comprehend  utterances  produced  by  a  wide  range  of 
talkers  under  a  variety  of  conditions.  IJus  ability  to  recognize  and  adjust  to  task-irrelevant  differences  among  speakers  has  led 
investigators  to  assume  that  there  exists  a  mechanism  that  "normalizes"  the  disparities  in  speaker  voice  characteristics  to 
efficiently  maintain  perceptual  constancy  in  perceiving  speech  signals  (e.g.,  Logan.  1989;  Nusbaum  &  Morin,  1989). 

Research  has  demonstrated  that  the  normalization  process  is  time-consuming  and  resource  demanding.  For  example, 
Allard  (1976),  measuring  reaction  time  (RT)  in  an  AX  task  for  word  stimuli  varying  in  speaker,  noted  that  "same  word" 
decisions  tended  to  be  faster  when  the  two  stimuli  were  physically  identical  relative  to  trials  with  changes  in  either  speaker  or 
intonation.  Verbrugge  et  al.  (1976)  showed  that  identification  of  natural  vowels  was  more  accurate  when  the  stimuli  were  tokens 
produced  by  a  single  talker  rather  than  tokens  produced  by  a  number  of  talkers. 

8  Recent  concerns  expressed  by  Goldinger  (1992)  have  suggested  that  extant  normalization  research  and  theory  have  not 

created  compelling  arguments  for  a  speaker-normalization  process.  The  argument  that  effects  of  speaker  vanability  are  attention- 
demanding  implies  that  speaker  vanability  should  impair  performance  of  subjects  operating  under  time  constraints,  even  if  no 
normalization  process  is  engaged.  According  to  Goldinger,  virtually  all  "normalization  effects"  could  be  due  to  mere  distraction 
from  irrelevant  source  vanability,  rather  than  an  actual  normalization  for  speaker.  In  essence,  simply  demonstrating  effects  of 
stimulus  variability  does  not  distinguish  between  normalization  as  a  separate  process  and  normalization  as  a  reflection  of  typical 
limitations  on  processing  imposed  by  added  stimulus  variability.  We  use  the  term  "passive  normalization"  for  the  latter 
conceptualization  since  the  perceptual  system  is  simply  conjectured  to  respond  directly  to  information  in  the  stimulus. 

The  passive  normalization  hypothesis  assumes  that  the  perceptual  system  evaluates  stimulus  differences  based  upon 
some  decision  metric  which  is  monotonically  related  to  signal-to-noise  ratio  (S/N).  Following  a  Signal  Detection  Theory  analysis, 
the  difference  between  any  pair  of  stimuli  defines  the  difference  in  central  tendency  between  the  distribution  of  same  and 
different  events  and  thus  the  magnitude  of  S.  The  decision  system  has  noise  due  to  the  variability  in  the  stimuli  and  stimulus 
coding,  thus  defining  N.  Adding  variability  to  a  given  comparison  (with  fixed  or  constant  S)  thus  increases  N  and  results  in  a 
corresponding  decrease  in  S/N.  Slower  and  poorer  performance  would  be  due  to  the  more  difficult  discrimination  (reflecting  the 
lower  S/N).  Therefore.  according  to  this  description,  normalization  is  not  a  separate  perceptual  process.  Rather,  the  decrements 
in  performance  simply  reflect  different  levels  of  signal  quality  relative  to  noise.  This  conceptualization  of  normalization  is 
theoretically  uninteresting  (as  Goldinger  might  agree)  because  it  implies  that  stimulus  variability  effects  are  not  the  result  of 
some  form  of  adaptive  central  processing,  but  instead  reflect  the  standard  operation  of  relatively  static  signal  processing 
mechanisms. 

An  active  normalization  hypothesis  begins  with  the  conditions  described  Tot  passive  normalization,  but  assumes  that  the 
perceptual  system  is  able  to  respond  to.  and  then  factor  out,  some  expected  task-irrelevant  stimulus  variability.  The  system  is 
conjectured  to  accurately  anticipate  the  nature  of  the  variability,  then  effectively  reduce  N.  thus  increasing  S/N.  Incorrect 
anticipation  can  result  in  an  inappropriate  reduction  of  S.  increase  in  N.  or  both,  any  of  which  may  culminate  in  an  overall 
decrease  in  S/N.  Altering  the  system  to  accommodate  the  appropriate  nature  of  the  variability  is  assumed  to  require  time  and 
utilize  processing  rapacity.  Ilie  increase  in  R  I  would  reflect  the  time  needed  by  the  processes  to  restore  S/N.  which,  in  turn, 
should  result  in  high  response  accuracy.  Consistent  wilh  this  S/N  characterization.  Summerfield  and  Haggard  (I97.S)  have 


» 


» 


» 


» 


» 


» 


» 


Timbre  Normalization  2 

suggested  that  reaction  time  results  may  reflect  the  natural  operation  of  the  normalization  process,  whereas  the  decrease  in 
accuracy  when  responding  to  tokens  produced  by  different  sources  may  simply  indicate  that  the  normalization  process  may  not 
be  perfect.  Thus,  the  accuracy  effects  observed  in  the  speech  literature  may  reflect  some  degree  of  a  failure  of  normalize.  In 
order  to  provide  a  convincing  argument  for  any  normalization  process,  one  must  demonstrate  that  the  listener  actively  uses 
knowledge  or  memory  of  particular  sources  in  adjusting  for  their  variability. 

Normalization  and  Instrument  Timbre 

The  present  investigation  attempts  to  demonstrate  the  existence  of  an  active  normalization  process  in  perceiving  music 
stimuli.  The  use  of  music  stimuli  may  offer  an  alternative  forum  in  which  to  investigate  the  nature  of  normalization  processes. 
Also,  the  sometimes  simpler  structure  of  music  stimuli  (Handel,  1989)  may  enable  us  to  gain  a  better  understanding  of  the 
processes  implicated  in  normalization. 

In  speech  normalization,  variations  among  speakers  are  normalized  in  the  perception  of  words;  if  there  is  an  analogous 
music  normalization  process,  then  variations  in  instrument  timbre  should  be  normalized  in  the  perception  of  triads.  Timbre  is 
the  subjective  attribute  of  source  (instrument)  that  is  based  on  invariant  properties  that  uniquely  characterize  the  tones  produced 
by  the  source.  Unfortunately,  the  pursuit  of  an  adequate  definition  of  timbre  is  both  related  to  and  dependent  upon  establishing 
which  characteristics  (or  combination  of  characteristics)  are  important  for  perceptually  determining  an  instrument's  distinctive 
sound  quality.  As  a  result,  existing  "definitions"  of  timbre  tend  to  focus  as  much  upon  what  does  not  constitute  timbre  as  what 
factors  actually  contribute  to  timbre.  Thus,  the  American  Standards  Association  (1960,  p.45)  defines  timbre  only  as  "that 
attribute  of  auditory  sensation  in  terms  of  which  a  listener  can  judge  that  two  sounds  similarly  presented  and  having  the  same 
loudness  are  dissimilar." 

In  addition  to  determining  the  possible  nature  of  normalization,  the  current  research  also  seeks  to  identify  some 
important  global  components  of  instrument  timbre  that  may  be  differentially  subject  to  normalization.  Speech  normalization 
research  to  date  has  established  that  the  auditory  system  adjusts  for  variability  in  the  articulator  (e.g..  a  speaker's  vocal  tract) 
(Nusbaum  &  Morin,  1989).  However,  only  limited  research  has  been  conducted  to  specify  what  kind  of  variability  the  system  is 
anticipating.  By  determining  the  bases  of  listener  expectations  and  their  relations  to  physical  characteristics  of  the  signal,  we  may 
also  utilize  normalization  as  a  tool  for  investigating  perceptual  processing.  In  the  process,  we  also  should  be  able  to  effectively 
address  the  issue  raised  by  Goldinger  about  whether  the  loss  in  speed  is  simply  the  result  of  (passive)  perceptual  limitations,  or 
some  active  (normalization)  process  in  which  the  system  adjusts  itself  to  factor  out  certain  expected  irrelevant  properties. 

Relevant  Properties  of  Music  Stimuli 

An  implicit  assumption  in  most  of  the  speaker  normalization  research  has  been  that  listeners  make  some  judgment  or 
comparison  based  on  the  similarity  between  two  tokens  (lx>gan,  1990).  If  the  two  tokens  are  highly  similar,  they  should  be 
judged  as  originating  from  the  same  speaker  and  would  not  be  subject  to  normalization  processes.  If  the  tokens  are  highly 
dissimilar,  they  should  be  judged  as  originating  from  different  speakers  and  would  be  subject  to  normalization.  Paralleling  this 
logic,  the  same  assumption  should  apply  for  the  postulated  music  normalization,  substituting  instrument  timbre  for  speakers. 
However,  in  order  to  test  this  assumption,  it  becomes  important  to  determine  what  stimulus  properties  define  instrument  timbre 
for  listeners. 

One  promising  approach  to  understanding  the  physical  correlates  of  timbre  has  come  from  research  on  the  perception 
of  systematically  altered  stimuli  that  has  identified  some  attributes  of  waveforms  that  may  be  important  in  instrument 
recognition.  For  each  partial  of  synthetic  stimuli  (modeled  after  natural  stimuli).  Grey  (1977)  identified  an  attack  transient,  and 
intermediate  steady-state,  and  decay.  Removal  of  initial  20-50  ms  segments  of  his  250  ms  stimuli  resulted  in  a  significant 
impairment  in  the  ability  to  identify  different  instruments.  Therefore,  it  would  appear  that  onsets  (or  attacks)  of  musical  tones 
may  contain  essential  cues  for  identification  and  discrimination  of  instrument  timbre.  Grey’s  findings  seem  lo  confirm  the 
suggestion  by  Saldhana  and  Corso  (1964)  that  onset  cues  appear  to  be  more  significant  than  offset  cues  in  discrimination  tasks 
for  music  stimuli. 

Multidimensional  scaling  (MDS)  also  has  been  an  effective  approach  to  understand  the  spatial  representations  of  a 
listener's  similarity  and  difference  judgments  among  a  given  stimulus  set  (Grey  &  Moorer,  1977).  In  MDS.  perceived  similarities 
or  differences  are  used  to  represent  subjective  distance,  and  then  to  create  a  cognitive  "map"  that  attempts  to  describe  the 
perceptual  relationships  among  stimuli.  Plomp  (1976)  used  MDS  to  investigate  the  perception  of  steady-state  portions  of  nine 
synthesized  instruments  playing  the  same  note.  Scaling  of  timliral  similarity  judgments  was  highly  correlated  with  the  pattern  of 
the  spectral  (as  opposed  to  temporal)  envelope.  Therefore.  MDS  research  indicates  that  the  physical  pattern  of  energy  in  the 
spectrum  seems  to  form  a  basis  of  timbre  judgments.  Generally,  similarity  clustering  seemed  lo  correspond  to  class  membership 
of  the  instruments  (e.g..  strings,  brass,  woodwinds),  indicating  that  timbre  is  largely  determined  by  spectral  composition. 

In  order  to  evaluate  the  importance  of  onset  transitions  and  upper  harmonics  (thus  temporal  and  spectral  envelope)  in 
instrument  timbre,  the  present  study  included  an  investigation  of  normalization  for  stimuli  that  have  been  physically  altered.  If 
normalization  processes  exist  for  music  stimuli,  the  prime  factors  contributing  to  instrument  timbre  should  be  differences  in 
attacks  and/or  spectral  composition  among  instruments.  These  considerations  guided  our  selection  of  instruments  for  the  present 
investigation,  and  are  the  bases  for  the  physical  manipulations  in  later  experiments. 

Current  Investigation 

The  current  investigation  was  composed  of  three  chord  (triad)  judgment  experiments  and  one  similarity  scaling 
experiment  Experiment  I  was  a  chord  judgment  study  (patterned  after  speech  normalization  studies)  that  attempted  lo 
demonstrate  normalization  effects  for  synthetic  music  stimuli  Experiment  2  was  a  chord  identification  study  that  used  a  subset 
of  the  same  stimuli  lo  evaluate  whether  normalization  is  an  active  process 

I  xpertmrnt  ^  was  composed  of  a  scries  of  sealing  <ondi turns  m  which  natural  instrument  tokens  were  used  In  mu 


» 


» 


» 


» 


» 


» 


» 


L 


Timbre  Normalization 


3 


condition,  natural  and  synthetic  tokens  were  compared  to  assess  generalizability  of  the  results  from  Experiments  1  and  2  (where 
synthetic  instruments  were  used).  Two  other  scaling  tasks  were  used  to  identify  the  importance  of  attack  and  spectral 
composition  in  defining  instrument  timbre  for  our  stimuli.  Experiment  4  was  another  chord  judgment  experiment  where  a 
selected  subset  of  the  natural  and  altered-natural  stimuli  were  used  to  evaluate  the  predicted  relationships  between  perceived 
similarity  and  normalization.  The  information  obtained  from  these  studies  begins  to  provide  a  better  understanding  of  not  only 
the  nature  of  normalization,  but  also  of  which  stimulus  properties  are  'normalized.* 

Experiment  1 

The  goal  of  Experiment  1  was  to  demonstrate  normalization  for  music  timbre,  evaluating  whether  changes  in 
instrument  alter  the  speed  (but  not  the  ability)  of  subjects  to  respond  to  the  equivalent*  of  triads  in  an  AX  task.  If  findings 
that  characterize  talker  normalization  also  apply  to  music  stimuli,  judgment  of  triads  should  be  faster  when  chords  arc  played  by 
the  same  instrument  Conversely,  slower  reaction  times  should  be  obtained  when  triads  arc  played  by  instruments  with 
significant  timbral  differences.  Maintenance  of  performance  accuracy  in  a  multiple-source  condition  is  critical  to  demonstrating 
the  operation  of  normalization  processes  for  the  reasons  described  above  for  the  active  normalization  hypothesis. 

Method 

Subjects.  Twenty-five  undergraduate  students  enrolled  in  psychology  courses  at  Binghamton  University  participated  in 
partial  fulfillment  of  course  requirements.  Since  subjects  needed  to  distinguish  between  major  and  minor  chords,  all  subjects 
were  asked  to  self-select  based  upon  having  at  least  minimal  knowledge  of  music  theory.  Because  of  past  experience  with  the 
high  variability  in  both  the  ability  and  motivation  of  subjects  in  this  pool,  we  always  establish  a  priori  criteria  for  inclusion  of 
subject  data.  For  the  current  study,  the  criteria  was  better  than  chance  performance  in  all  the  conditions.  Four  subjects  failed 
to  meet  this  criterion.  Data  from  one  other  subject  was  lost  due  to  a  computer  malfunction.  Thus,  results  for  this  experiment 
are  based  on  data  from  the  remaining  20  subjects. 

Stimuli.  The  stimuli  consisted  of  chords  by  five  digitally  sampled  instrument  sounds  produced  on  a  Roland  synthesizer 
keyboard.  The  five  instruments  were  piano,  harpsichord,  violin,  flute,  and  trumpet  These  instruments  were  cbo6cn  on  the  basis 
of  the  characteristic  physical  properties  of  the  stimuli  they  typically  produce.  Previous  research  seemed  to  identify  the  attack 
transition  as  an  important  property  in  defining  instrument  timbre.  Naturally  produced  stimuli  from  the  piano,  harpsichord,  and 
brass  instruments  are  all  characterized  as  having  a  quick  attack  and  rapid  decay,  whereas  woodwinds  and  strings  have  a  relatively 
slow  attack  and  gradual  decay  (Fletcher,  1991).  The  piano,  flute,  violin,  and  trumpet  were  chosen  because  these  instruments  are 
readily  discriminable  from  one  another.  I'he  harpsichord  and  piano  served  the  role  as  possibly  confusable  instruments  because 
of  their  similar  timbres  that  may  be  based  upon  attack,  partials.  or  overall  waveform  properties  If  the  harpsichord  and  piano 
are  indeed  easily  confusable,  one  should  expect  to  find  faster  reaction  times  for  comparisons  across  these  instruments  relative  to 
comparisons  across  discriminable  instruments. 

There  were  two  871.5  ms  samples  of  each  instrument  recorded  for  each  of  four  triads:  C-major  (C-E-G).  C-minor  (C- 
E*-G).  E%-major  (E*-G-B*).  and  Efc-minor  (E‘-G*-B*),  where  possible,  in  the  same  octave  above  middle  C\  This  ordered  chord 
progression  represents  the  degree  of  relatedness  for  any  two  given  chords,  with  each  listed  chord  differing  from  the  immediately 
preceding  chord  by  a  factor  of  one  note:  t.e.,  C-major  differs  from  C-minor  by  one  note,  from  Efc-major  by  two  notes,  and  E*- 
minor  by  all  three  notes.  The  chords  were  recorded  on  a  high-bias  chrome  cassette,  then  converted  to  a  12-bit  digital 
representation  at  a  10  kHz  sample  rate  with  4  kHz  low-pass  filtering.  A  386-DOS  computer  was  used  on-line  to  randomize  r  J 
order,  present  the  stimuli,  time  events,  and  record  responses.  The  stimuli  also  were  4  kHz  low-pass  filtered  at  presentation,  and 
were  delivered  binaurally  over  TDH-49  headphones  in  a  commercial  acoustic  chamber. 

Procedure.  Experiment  1  used  an  AX  chord  discrimination  task  that  was  blocked  for  instrument  condition.  Subjects 
were  instructed  to  judge  whether  or  not  the  two  chords  presented  on  a  given  trial  were  equivalent  (consisted  of  the  same  notes), 
with  emphasis  placed  on  the  need  to  disregard  the  instrument  playing  the  notes.  Following  the  procedures  typically  used  in 
normalization  studies,  instrument  variability  was  manipulated  within  subjects  by  presenting  stimuli  in  two  blocks  representing 
unique  instrument  conditions:  Single  Instrument,  and  Mixed  Instrument.  The  120-trial  Single  Instrument  condition  consisted  of 
stimuli  from  only  a  single  instrument  within  each  trial,  but  stimuli  from  different  instruments  across  trials.  I'he  Mixed 
instrument  condition  consisted  of  chords  played  by  different  instruments  both  within  and  across  the  480  trials.  I’he  order  of 
conditions  was  counterbalanced  across  subjects.  Subjects  were  informed  of  the  exact  nature  of  each  condition  prior  to  the  block 
of  trials.  Within  each  condition  there  was  an  equal  (0.50)  probability  of  same  and  different  chord  presentations. 

A  trial  consisted  of  presentation  of  the  A  stimulus,  a  1500  ms  IS1,  the  X  stimulus,  and  a  3000  ms  response  interval. 
Subjects  were  instructed  to  respond  as  quickly  and  accurately  as  possible;  responses  were  indicated  by  pressing  one  of  two  keys 
(corresponding  to  same  vs.  different  chord)  on  a  response  pad.  All  responses  were  recorded  by  the  computer  that  measured  RT 
using  a  1  ms  time-base  and  prohibited  any  change  in  response. 

Results  and  Discussion 

Based  upon  speech  normalization  findings,  subjects  should  be  faster  (and  possibly  more  accurate)  in  the  Single 
Instrument  condition  compared  to  the  Mixed  Instrument  condition.  Table  l  shows  mean  accuracy  and  RT  results.  I'he  mean 
RT  values  across  subjects  were  obtained  from  individual  median  RT  scores  for  correct  responses  only.  I’he  full  set  of  RT  and 
accuracy  data  were  subjected  to  separate  2x2  ANOVAs,  with  instrument  condition  (Single  vs.  Mixed)  and  chord  (same  vs 
different)  serving  as  variables.  The  analysis  of  different  chord  comparisons  was  further  broken  down  by  the  numlier  of  notes 
between  chords;  discussion  of  this  finer  analysis  of  results  will  follow  the  separate  discussion  of  general  RT  and  accuracy  results 


» 


Timbre  Normalization 


4 


ft 


Insert  Table  1  Here 


"Hie  results  in  Table  1  show  that  subjects  performed  quite  accurately  in  the  Single  Instrument  condition  and,  in 
comparison,  very  poorly  in  the  Mixed  Instrument  condition  |F(  1,19)  =  164.78.  j>  <  0.01).  This  effect  of  instrument  condition 
occurred  for  both  same  and  different  chord  trials  |F(1.19)  =  134.03,  j>  <  0.01,  and  £(1,19)  =  65.20,  g  <  0.01,  respectively). 

The  reduced  accuracy  (on  both  same  and  different  chord  trials]  for  judgments  across  instruments  indicates  that  subjects  were 
seldom  able  to  ignore  irrelevant  stimulus  differences  associated  with  instrument  timbre.  Although  performance  on  single 
instrument  trials  was  significantly  faster  than  mixed  instrument  trials  (F(1.19)  =  17.85,  p  <  0.01 1.  as  hypothesized,  the  accuracy 
data  appear  to  indicate  a  general  failure  to  normalize,  and  therefore  do  not  allow  the  use  of  RT  to  evaluate  differences  in 
processing  (i.e.,  normalization).  However,  two  subjects  performed  at  relatively  high  levels  of  accuracy  under  both  Single  and 
Mixed  Instrument  conditions,  and  their  data  (which  are  discussed  later)  allow  some  evaluation  of  normalization  for  instrument 
timbre. 

There  was  no  main  effect  of  same/different  chord  on  accuracy  (£(1,19)  =  0.48,  g  <  0.10),  but  there  was  a  significant 
chord  by  instrument  condition  interaction  (F(  1 .19)  =  17.66,  g  <  0.01J.  In  the  Single  Instrument  condition  there  was  a  significant 
tendency  to  respond  ‘same'  (F(l,19)  =  14.38,  g  <  0.01  j.  whereas  in  the  Mixed  Instrument  condition  there  was  a  tendency  to 
respond  'different’  (£(1,19)  =  2.43,  g  <  0.14).  Thus,  it  appears  that  subjects  may  have  responded  as  much  on  the  basis  of 
overall  timbre  as  on  the  basis  of  equivalence  of  chords. 

Chord  Analysis.  Table  2  also  summarizes  the  effects  of  note  differences  for  different  chord  trials  across  Single  and 
Mixed  Instrument  conditions.  Discrimination  of  one-,  two-,  and  three-  note  differences  between  chords  represents  increasing  S. 
and  thus  increasing  S/N.  Accuracy  should  be  expected  to  increase,  and  RT  to  decrease,  with  greater  S/N.  Thus,  it  is  not 
surprising  that  performance  was  better  for  discrimination  of  three  note  differences  than  one-  or  two-note  differences,  but  the 
absence  of  any  differences  in  RT  was  not  expected.  The  lack  of  note  difference  effects  on  RT  may  have  been  due  to  the  high 
error  rates  in  the  Mixed  Instrument  condition.  There  was  a  marginal  main  effect  of  same/different  chord  on  RT  |F(1,19)  = 
3.95,  g  <  0.10),  which  is  attributable  to  the  significantly  slower  response  times  for  different  chord  trials  (£(1,19)  =  5.98,  g  < 
0.05)  in  the  Single  Instrument  condition  (where  accuracy  was  higher). 

Instrument  Analysis.  Table  2  shows  d’  as  a  measure  of  accuracy  for  each  instrument  combination  on  the  AX  trials,  d' 
was  computed  for  each  listener  based  upon  probabilities  that  subjects  responded  'same"  to  equivalent  (hit)  and  nonequtvalcnt 
(false  alarm)  chords,  d’  then  was  averaged  across  subjects  (Pastore  &  Scheirer.  1974).  Table  2  also  shows  RT  for  the 
instrument  comparisons. 


Insert  Iable  2  Here 


Based  upon  the  summarized  research  on  instrument  timbre,  it  was  expected  that  the  similarity  in  attack,  and  possibly 
upper  partials,  of  the  stimulus  waveforms  would  have  differential  effects  on  response  time.  For  example,  a  piano-harpsichord 
comparison  (where  both  instruments  have  quick  attacks)  should  have  been  faster  and  more  accurate  than  a  piano-woodwind 
comparison  (where  woodwinds  have  relatively  slow  attacks). 

For  both  same  and  different  chord  trials,  d'  was  always  highest,  and  RT  almost  always  fastest  for  single  instrument 
comparisons.  However,  none  of  the  expected  effects  based  on  instrument  waveshape  were  found.  One  possible  explanation  for 
the  lack  of  instrument  similarity  effects  may  be  that  although  the  instruments  were  selected  based  upon  expected  differences  in 
attack,  attack  may  not  have  been  extremely  critical  to  distinguishing  timbre  in  the  tokens  used.  It  is  also  possible  that  subjects 
were  not  sufficiently  experienced  with  music  stimuli  to  effectively  utilize  specific  Umbra!  components  such  as  attack;  the  lack  of 
normalization  for  most  subjects  is  consistent  with  this  possibility.  Also,  in  mixed  instrument  comparisons,  d'  was  higher  and  RT 
faster  (with  a  few  exceptions)  when  the  second  stimulus  was  played  by  the  piano.  It  may  be  that  subjects,  in  general,  have 
more  exposure  to  piano  timbre  or  chords  played  on  the  piano  (; k  u  *’"  ;-.s:mments),  and  thus  are  more 

efficient  perccivers  of  this  particular  timbre.  Other  work  from  our  laboratory  is  consistent  with  this  conjecture  (Hall  &  Pastore. 
1993). 

Testing  Musically  Proficient  Subjects 

Kxpenment  1  sought  to  demonstrate  normalization  in  terms  of  increased  response  latencies,  but  not  decreased  accuracy, 
in  conditions  where  timbre  differed  within  trials.  Subjects  performed  quickly  and  accurately  on  single  instrument  chord 
comparisons,  and  significantly  slower,  but  also  with  considerably  less  accuracy,  on  mixed  instrument  chord  comparisons.  These 
overall  results  have  not  produced  support  for  a  meaningful  normalization  process  for  music  timbre.  Although  the  subjects  were 
asked  to  self-select  for  participation  liased  on  musical  experience,  the  accuracy  results  suggest  that  the  majority  of  the  participants 
were  not  proficient  musicians. 

The  results  from  Experiment  1  are  consistent  with  suggestions  that  chord  and  timbre  may  be  integral  (in  the  Gamer 
(1974)  sense).  If  two  dimensions  are  truly  integral,  then  subjects  should  not  be  able  to  normalize  for  one  dimension  when 
perceiving  the  second  dimension.  For  example,  Krumhansl  and  Iverson  (1992)  found  that  pitch  (of  isolated  tones)  and  timbre  do 
interact  to  some  degree  (or  are  not  perceived  independently);  subjects  could  not  attend  to  the  pitch  of  a  tone  without  being 
influenced  by  its  timbre  Musical  experience  may  lie  a  factor  in  the  degree  of  integrality  between  pitch  and  timbre.  Wolport 
(199bj  obtained  results  that  suggested  timbre  is  moic  salient  than  pitch  for  nonmusicians  than  musicians;  alternatively,  it  seems 
possible  that  nonmusinans  may  have  the  ability  to  separate  chord  from  timbre,  but  have  difficulty  in  understanding  what  is 
required  of  them  Consistent  with  l»oih  types  of  explanations.  Beal  (1985)  reported  that  nonmusieiam  found  it  more  <1  it  I  n  ull 


» 


» 


» 


» 


» 


Timbre  Normalization 


5 


than  musicians  to  judge  two  chords  as  the  same  when  they  were  played  on  different  instruments.  'There  also  are  a  number  of 
other  reports  (e  g.,  Pitt,  under  review)  that  nonmusicians  have  difficulty  separating  pitch  and  timbre  Thus,  music  experience  may 
be  important  in  the  degree  to  wh*ch  pitch  and  timbre  interact. 

Could  the  results  of  experiment  1  have  been  due  to  the  lack  of  reasonable  musical  experience  for  the  subjects,  or  do 
the  results  instead  reflect  limits  on  music  perception  processes  that  occur  even  for  experienced  listeners?  Separate  analyses  of 
data  for  musicians  and  nonmusicians  may  be  required  to  investigate  this  possibility  that  music  experience  differentially  affected 
timbre  and  pitch  perception. 

Musical  histories  were  known  for  4  of  the  20  subjects  in  Experiment  1.  Two  (Subjects  7  and  13),  were  proficient 
musicians  and  performed  very  accurately  in  all  conditions.  Their  data  thus  allow  some  evaluation  of  normalization.  Subject  21 
had  moderate  musical  experience,  although  to  a  lesser  extent  than  Subjects  7  and  13,  whereas  Subject  2  had  only  about  1  year  of 
music  experience. 

Table  3  provides  accuracy  and  RT  data  for  the  4  subjects.  Subject  7  performed  at  high  levels  of  accuracy  throughout 
the  experiment,  with  100  and  96  percent  correct  on  same  chord  trials  for  Single  and  Mixed  Instrument  conditions,  respectively. 
The  RT  data  for  this  subject  are  somewhat  consistent  with  normalization  predictions,  with  increased  variability  in  the  different 
instrument  conditions  resulting  in  longer  response  times  for  same  chord  trials  [t(9)  =  .E78,  p>.10|.  Thus,  n  appears  that  Subject 
7  may  have  normalized  for  stimulus  variability,  maintaining  accuracy  at  a  cost  to  speed.  Subject  13  also  performed  at  100 
percent  correct  for  the  same  instrument  conditions  and  at  a  good,  but  somewhat  poorer  level  of  92  percent  correct  on  different 
instrument  conditions  for  same  chord  trials.  Although  RT  was  in  the  expected  direction,  the  difference  did  not  approach 
significance  (t(9)  =  .84,  p>_50|.  Subjects  2  and  21  performed  at  97  and  100  percent  correct,  respectively,  for  Single  instrument 
trials.  However,  performance  for  these  subjects  was  poorer  for  the  different  instrument  trials,  at  55  and  68  percent  correct. 

The  RT  data  for  these  two  subjects  do  follow  the  expected  normalization  patterns,  where  RT  for  different  instrument  -  same 
chord  trials  were  significantly  longer  than  same  instrument  -  same  chord  trials  (t(9)  =  3.52.  pc.Ol,  and  t(9)  =  5.69.  p<.002  for 
Subjects  2  and  21.  respectively]  Rased  on  the  results  from  our  musically  experienced  subjects  (7  and  13),  it  is  possible  that 
testing  clearly  experienced  musicians  may  resolve  the  accuracy  problem  and  provide  stronger  support  for  an  active  model  of 
normalization.  This  was  the  primary  goal  of  Experiment  2,  which  examines  normalization  as  a  function  of  musical  proficiency. 


Insert  Table  3  Here 


Experiment  2 

The  logic  behind  the  second  experiment  begins  with  the  assumption  that  normalization  is  an  active  process.  When  a 
comparison  is  to  be  made  between  two  sequentially  presented  stimuli,  the  first  stimulus  then  should  set  up  expectations  about  the 
parameters  or  nature  of  the  processes  to  utilize  in  analyzing  the  second  stimulus.  Furthermore,  the  fust  stimulus  need  not  be 
auditory  to  generate  expectations  about  a  subsequent  stimulus.  Presentation  of  a  visual  cue  for  instrument  should  provide 
sufficient  information  for  the  system  to  anticipate  well-learned  stimulus  characteristics  One  would  expect  valid  cues  (which 
correctly  cue  expectations  about  the  instrument  playing  the  auditory  stimulus)  to  produce  faster  response  times  than  invalid  visual 
cues  (which  set  the  processes  for  an  incorrect  instrument).  If  normalization  instead  is  a  passive  process,  then  expectations 
become  irrelevant  for  stimulus  processing.  Thus,  the  nature  of  visual  cues  should  not  influence  response  times 

Following  this  logic.  Experiment  2  used  a  Posner  (1980)  cross-modality  cuing  paradigm  to  determine  whether  the 
perceptual  system  can  actively  anticipate,  or  can  only  passively  process,  irrelevant  stimulus  (timbre)  variability.  Subjects  with 
known  musical  abilities  were  tested  for  reaction  time  effects  with  relatively  high  levels  of  performance  accuracy.  Since  the  visual 
cues  should  not  directly  alter  the  S/N  ratio,  there  should  be  no  effect  of  valid  or  invalid  visual  instrument  cues  on  subsequent 
chord  judgments  if  normalization  only  reflects  a  decrease  in  S/N.  However,  if  normalization  is  an  active  process,  then  longer 
reaction  times  would  be  predicted  for  invalid  cues,  reflecting  the  perceptual  system  preparing  for  irrelevant  (timbre)  vanabilitv 
and/or  determining  that  the  expectations  were  inappropriate. 

Method 

Subjects.  Six  subjects  participated  in  Experiment  2.  With  the  exception  of  Subject  6.  who  had  only  studied  an 
instrument  (piano)  for  approximately  1  year,  most  subjects  had  over  5  years  of  music  experience.  The  4  subjects  with  known 
musical  histories  from  Experiment  1  also  participated  in  Experiment  2.  All  subjects  reptirted  normal  hearing 

Stimuli  and  Procedure  The  experimental  design  was  a  speeded,  single-stimulus,  major/mmor  chord  lalicling  task,  and 
employed  a  cuing  procedure  adapted  from  the  work  on  visual  attention  by  Posner  and  colleagues  (Posner.  1980;  Posner.  Snyder. 
&  Davidson.  1980)  Subjects  were  presented  with  a  500  ms  visual  cue  followed  by  an  IS  I  that  randomly  varied  between  1  and 
2.5  s  in  500  ms  increments  Rased  upon  the  visual  work,  the  variable  ISI  was  intended  to  eliminate  general,  temporally-related, 
anticipatory  effects  due  solely  to  the  presence  of  the  visual  cue.  The  LSI  was  followed  by  a  (  -major  or  (.-minor  chord  from 
Experiment  1  played  by  one  of  four  instruments;  piano,  harpsichord,  brass,  or  strings.  The  wixxlwind  stimuli  were  not  used  in 
Experiment  2  because  subjects  in  Experiment  1  often  reported  that  the  instrument  sample  did  not  sound  characteristic  of  its 
natural  counterpart. 

The  subjects’  task  was  to  identify  each  triad  as  major  or  minor  by  differential  button  presses  on  a  keypad.  Cues, 
which  were  always  orthogonal  to  (and  ronveyed  no  information  alxiut)  whether  a  chord  was  major  or  minor,  were  neutral,  valid, 
or  invalid  with  respect  to  the  instrument  playing  the  target  The  cue  "+  *  *  4"  was  neutral  with  respect  to  instrument  and  was 
presented  with  a  probability  nt  til?  Instrument  cues  were  four-character  representations  ol  a  given  instrument  |  “N  IK(  i" 

(strings)  “PINO"  (piano).  "RRAV  (brass)  or  "HARP"  (harpsichord)],  and  (xrurred  on  the  remaining  trials  (p  U  S 3 ) 


Timbre  Normalization 


6 


Instrument  cues,  when  presented,  were  valid  with  a  prolxabihty  of  0.80,  and  thus  were  valid  with  an  overall  probability  of  0.66 
(p  =  0.80  *  0.83)  across  alt  trials;  instrument  cues  were  invalid  with  an  overall  probability  of  017  (p  =  0.20  *  0.83).  l'hrec 
blocks  of  200  trials  were  presented. 

Results  and  Discussion 

Figure  1  shows  mean  reaction  limes  obtained  from  individual  median  scores  for  correct  responses  given  valid,  neutral, 
and  invalid  cue  trials-  In  the  figure  subjects  are  ordered  in  terms  of  increasing  RT  for  neutral  cue  conditions.  Standard  error 
bars  also  are  provided.  A  one-way  ANOVA  of  RT  revealed  a  significant  main  effect  of  cue  validity  |F(25)  =  658,  p  <  0.05]. 
This  effect  was  primarily  a  result  of  consistently  longer  reaction  times  to  invalid  trials  relative  to  valid  trials  (Tukey-tcst  p  < 
050).  Response  latencies  for  neutral  trials  also  were  consistently  longer  relative  to  those  for  valid  trials,  but  this  difference  did 
not  reach  statistical  significance  (Tukcy  >  0.05). 


Insert  Figure  1  Here 


There  were  minimal  differences  in  accuracy  effects,  with  the  two  most  highly  practiced  musicians  (Subjects  1  and  3  in 
the  current  experiment,  and  subjects  7  and  13  respectively,  in  Experiment  1)  performing  at  essentially  100  percent  correct,  and 
with  Subjects  2,  4,  and  5  performing  at  better  than  90  percent  correct  regardless  of  cue  and  target  instrument.  These  ceiling 
effects  were  not  viewed  as  a  problem  because  we  sought  to  demonstrate  an  active  normalization  process  through  RT  effects  in 
the  context  of  consistently  high  accuracy  levels.  Subject  6  (who  was  a  relatively  inexperienced  musician  and  who  had  the  least 
amount  of  training  of  the  6  subjects)  performed  at  approximately  79  percent  correct  and  also  was  the  only  subject  whose 
responses  on  valid  cue  trials  were  not  significantly  faster  than  invalid  trials. 

The  relatively  accurate  responding  of  subjects  across  all  conditions  is  consistent  with  Krumhansl  and  Iverson's  (1992) 
conclusion  that  “the  interaction  between  timbre  and  pitch  of  single  tones  does  not  imply  that  it  is  impossible  to  abstract  and 
compare  pitches  of  tones  with  different  timbres  or  timbres  of  tones  with  different  pitches-.  The  increased  reaction  limes  in  tasks 
requiring  this  information  to  be  abstracted  indicates  that  this  process  requires  additional  time  (p.749). " 

It  was  anticipated  (as  in  Experiment  1)  that  the  relationship  between  the  expected  properties  of  cued  instrument 
waveform  and  the  perceived  properties  (based  on  waveshape)  of  target  instruments  would  influence  RT  and  accuracy  Table  4 
reports  mean  reaction  times  for  each  possible  combination  of  cue  and  target  instrument.  Although  no  consistent  effects  of 
waveshape  were  obtained,  there  were  some  notable  trends  in  the  RT  data.  It  appears  that  RT  was  fastest  for  the  piano  in  both 
valid  and  neutral  cue  trials.  For  8  of  the  12  invalid  cue  comparisons,  reaction  times  for  trials  where  both  instruments  had  a 
rapid  attack  were  faster  than  trials  with  slower  attack  instruments.  For  instance,  the  piano-harpsichord  comparison  was  faster 
than  the  harpsichord-string  comparison.  The  results  also  show  that  the  harpsichord-piano  comparisons  were,  by  far.  the  fastest 
of  all  invalid  cue  trials.  These  results  might  be  due  to  the  similar  attack  functions  of  the  piano  and  harpsichord,  but  might  also 
be  due  to  a  combination  of  high  familiarity  with  the  piano  as  a  chord  instrument  and  to  similar  overall  waveshape  properties 
between  the  harpsichord  and  piano.  The  fast  reaction  times  for  harpsichord-piano  comparisons  also  provide  an  important  basis 
for  later  experiments  which  investigate  the  role  of  perceptual  similarity  in  normalization;  based  upon  an  implicit  assumption  from 
demonstrations  of  speaker  normalization  (Logan.  1990),  similar  tokens  should  be  processed  faster  than  dissimilar  tokens. 


Insert  Table  4  Here 


The  results  of  Experiment  2  clearly  provide  evidence  that  normalization  is  an  active  process.  Subjects  were 
significantly  slower  when  the  visual  cue  was  invalid  than  when  the  cue  was  either  valid  or  neutral,  indicating  a  cost  for 
inappropriate  perceptual  expectations.  This  cost  is  an  increase  in  processing  lime  due  to  inappropriate  expectations,  resulting  in  a 
need  to  identify  the  inaccurate  perceptual  setting,  then  to  readjust  for  proper  source  variability. 

The  use  of  digitally  sampled,  rather  than  natural  msttument  tokens  may  have  limited  the  observance  of  consistent 
waveshape  effects.  Subjects  in  lioth  experiments  often  reported  that  the  harpsichord  samples  did  not  sound  characteristically  like 
the  natural  instrument;  such  unnatural  timbre  had  already  been  identified  as  a  problem  for  the  woodwind  samples  in  Experiment 
I.  and  was  the  basis  for  not  using  those  stimuli  in  Experiment  2.  Therefore.  it  is  possible  that  some  stimulus  properties 
contributing  Jo  timbre  in  our  synthetic  tokens  may  have  prevented  consistent  normalization  to  expected  stimulus  properties  that 
otherwise  could  have  been  in  evidence  Thus,  such  additional  properties  may  have  limited  the  already  significant  mam  effects  of 
cuing  in  Experiment  2.  Tins  possibility  will  lie  investigated  in  Experiment  3  by  using  natural  instrument  tokens.  In  addition, 
waveform  manipulations  of  natural  tokens  may  help  identify  the  salient  components  of  timbre,  which  should,  in  turn,  be 
important  for  normalization. 

Experiment  3 

If  normalization  is  to  l»c  considered  an  active  process,  a  pertinent  question  becomes  what  physical  stimulus  properties 
are  aclually  being  factored-out  or  normalized?  The  goal  of  Experiment  3  was  to  identify  global  stimulus  properties  that 
contribute  to  the  characteristics  of  particular  instrument  timbres,  and  thus  form  the  basis  for  perceived  instrument  variability  that 
may  be  subject  to  normalization.  A  nnmlier  of  studies  (see  the  introduction)  have  indicated  that  attack  functions,  and  possibly 
spectral  composition,  arc  largely  responsible  for  instrument  timbre.  Experiment  3  provides  an  evaluation  of  the  importance  ol 
attack  and  spcriral  composition  (tn  terms  of  upper  partial*)  in  defining  the  similaril)  (or  conversely,  the  distim  nvenc.ss)  ol 
instrument  timbres 

Natural  instrument  tokens  were  physn.illy  altered  to  evaluate  possibly  (Titval  components  in  instrument  timbre  Iwn 


» 


Timbre  Normalization 


7 


types  of  physical  alterations  were  applied  to  the  natural  stimuli:  (1)  removal  of  the  attack  portion  of  each  stimulus,  and  (2) 
removal  of  all  higher-order  partials.  Experiment  3  thus  utilized  3  sets  of  "natural"  stimuli:  1)  full  (intact)  vers  ons  of  the  natural 
tokens,  2)  tokens  with  the  attack  portions  removed  ("cut-atiack"),  and  3)  filtered  tokens.  The  manipulated  stimuli  represent 
extremes,  in  that  attack  transitions  and  most  partials  were  completely  eliminated.  Thus.  if  the  attack  and/or  upper  partials  play 
significant  roles  in  instrument  timbre,  eliminating  the  g  ven  property  should  result  in  stimuli  which  are  highly  similar  to  each 
other  and  dissimilar  to  the  original,  unaltered  stimuli. 

A  similarity  scaling  procedure  was  used  to  evaluate  the  importance  of  the  attack  and  upper  partials  as  components  of 
instrument  timbre.  Subjects  were  randomly  assigned  to  one  of  three  conditions.  In  Condition  1  the  synthetic  stimuli  from 
Experiment  1  were  compared  to  their  intact  natural  counterparts.  This  comparison  allowed  us  to  assess  the  generalizability  of 
the  Experiment  2  results  obtained  using  synthetic  stimuli  to  natural  instrument  tokens.  In  Condition  2,  natural  instrument  tokens 
with  the  attack  portions  removed  were  compared  to  intact  natural  stimuli.  If  the  attack  component  of  a  chord  is  a  significant 
component  of  timbre,  instrument  tokens  with  missing  attack  transitions  should  be  perceived  as  highly  similar  to  each  other  and 
not  very  similar  to  intact  versions  of  the  same  instrument  In  Condition  3,  low-pass  filtered  natural  tokens  were  compared  to 
natural  intact  stimuli.  Using  the  logic  from  Condition  2,  if  the  higher-order  partials  are  significant  components  of  timbre,  filtered 
stimuli  should  be  perceived  as  highly  similar  to  each  other  and  not  very  similar  to  intact  versions  of  the  same  instrument. 

Method 

Subjects  Forty-two  students  enrolled  in  psychology  courses  at  Binghamton  University  served  as  subjects  (14  subjects 
for  each  of  the  3  conditions)  in  partial  fulfillment  of  course  requirements.  All  subjects  reported  normal  hearing  and  had  at  least 
2  years  of  musical  experience. 

Stimuli.  The  stimuli  consisted  of  chords  produced  by  five  natural  instruments.  I'he  instruments  were  the  same  as 
those  used  in  Experiment  1.  A  practiced  musician  with  a  given  instrument  produced  the  isolated  notes  C,  E.  E\  and  G.  each 
approximately  870  ms  in  duration.  I'he  samples  were  recorded  on  a  half-track  tape  recorder  (Tandberg  TD  20A  operating  at  15 
ips)  using  a  BAK  Model  4135  microphone.  I'he  recorded  stimuli  were  digitized  ( 1 2-bit  with  10  kHz  sample  rate  and  4  kHz  low- 
pass  antialiasing  filter).  I'he  stimuli  were  then  digitally  mixed  to  produce  C-major  chords  using  a  386-DOS  computer.  Thus,  all 
chords  were  created  as  though  three  musicians  simultaneously  played  the  component  notes,  even  if  the  full  chord  could  have 
been  played  on  a  single  instrument  (  g..  piano  or  harpsichord). 

The  cut-attack  stimuli  consisted  of  intact  stimuli  which  were  digitally  edited  to  remove  the  attack  functions.  The  most 
intense  portion  of  each  token  was  identified  to  determine  the  length  of  the  attack  function.  The  attack  (the  stimulus  portion 
prior  to  and  including  peak  amplitude)  then  was  excised  from  the  waveform  at  a  waveform  zero-crossing.  In  order  to  prevent 
any  sudden  onset  transients,  each  cut-attack  stimulus  then  was  amplitude-weighted  by  imposing  a  constant,  brief  (30  ms)  linear 
onset 


I'he  filtered  stimuli  consisted  of  intact  chords  that  were  low-pass  filtered  at  500  Hz,  with  a  72  dB/octavc  skirt  For 
these  stimuli,  most  of  the  higher-order  partials  were  removed,  leaving  the  fundamental  plus,  at  most,  one  partial  from  the  C  note 
and.  possibly,  one  partial  from  the  E  note.  Individual  tokens  were  attenuated  to  equate  overall  peak  amplitude  across  stimuli, 
l  ime  varying  spectrograms  for  each  of  the  stimuli  were  examined  to  confirm  the  effectiveness  of  the  temporal  and  spectral 
manipulations  performed  on  the  stimuli.  Spectrograms  showed  that  the  onset  portion  of  each  waveform  was  absent  for  all  the 
stimuli  in  the  cut-attack  manipulation.  Similar  analyses  for  the  filtered  stimuli  showed  that  most  of  the  higher  harmonics  were 
removed.  Obviously,  both  manipulations  eliminated  the  relative  onsets  of  the  upper  partials,  which  is  a  potential  interactive  cue 
for  timbre. 

Procedure.  Subjects  were  instructed  to  jjdgc  the  similarity  of  the  two  stimuli  presented  on  each  trial.  A  7-point  scale 
was  used,  with  "1"  indicating  minimal  similarity  and  "7"  indicating  maximal  similarity.  Each  condition  was  composed  of  three 
parts.  In  Part  1.  all  stimuli  in  the  experiment  were  presented  sequentially  to  give  the  subjects  an  idea  of  the  range  of  the 
different  instrument  tokens.  In  Part  2,  subjects  were  given  an  example  of  a  very  dissimilar  and  a  very  similar  pair  of  stimuli. 
Different  stimulus  pai^s  were  used  as  examples,  depending  on  the  condition.  I’hesc  examples  were  not  expected  to  produce 
demand  characteristics  since  the  subjects  were  told  that  the  pairs  should  not  be  considered  as  the  maximally  similar  or  dissimilar 
comparisons  on  which  to  base  their  later  judgements,  but  rather,  just  as  examples  of  similar  and  dissimilar  items.  No  data  were 
collected  in  these  initial  familiarization  sections  of  the  experiment.  In  Part  3,  similarity  ratings  were  collected  on  each  of  450 
trials.  A  tnal  consisted  of  the  first  stimulus,  a  1500  ms  1ST  and  presentation  of  the  second  stimulus.  Subjects  were  asked  to 
respond  within  a  3000  ms  response  interval  by  pressing  keys  corresponding  to  1-7  on  a  response  pad;  these  responses  were 
recorded  by  the  computer 
Results  and  Discussion 

For  every  condition,  the  average  similarity  rating  was  calculated  across  subjects  for  each  of  the  90  stimulus  pairs.  I'he 
mean  similarity  ratings  were  submitted  to  a  multidimensional  scaling  program  (Syslat)  using  a  Euclidean  metric  (Minkowski 
metric  with  r  =  2).  which  is  appropriate  for  integral  dimensions  (Garner  1974).  |Mean  ratings  were  also  analyzed  using  a  city- 
block  metric  (r  =  1)  *hTh  is  appropriate  for  separable  dimensions.  Since  both  metrics  yielded  highly  similar  solutions,  only  the 
Euclidean  solution  is  presented  and  discussed.  | 

For  Condition  1.  where  full  versions  of  the  natural  and  synthetic  stimuli  were  compared,  a  2-dimcnMonal  solution  was 
obtained  (stress  value  of  0  0M)  Panel  A  of  Figure  2  shows  the  MDS  space  l>ascd  on  dimensions  I  and  2.  Table  5  provides  a 
decoding  of  the  instrument  latK*ls  used  in  the  figure  With  the  possible  exception  of  the  harpsichord  and  woodwinds  (flute), 
where  similarity  Mill  is  quite  high,  overall  similarity  l»clwecn  the  natural  instruments  and  their  synthetic  counterparts  was  very 
hut' 


ft 


ft 


ft 


ft 


ft 


ft 


ft 


ft 


ft 


ms  very 


Timbre  Normalization 


8 


Insert  Figure  2  and  Table  5  Here 


Dimension  1  was  related  to  the  resonant  properties  of  the  instruments;  in  other  words,  the  wind  instruments 
(flute/ wood  wind  and  trumpet/brass,  whose  sounds  emanate  from  a  tube)  were  grouped  together,  while  the  string  instruments 
(violin,  piano,  harpsichord)  also  occupied  a  similar  space.  Dimension  2  appeared  to  be  related  to  degree  of  spectral  fluctuation. 
For  example,  flutes  tend  to  have  upper  harmonics  that  rise  in  amplitude  (at  onset)  and  decay  (at  offset)  in  close  alignment, 
whereas  strings  and  brass  instalments  (which  grouped  separately  along  Dimension  2)  tend  to  have  varied  amplitude  patterns  for 
individual  partials  at  onset  and  offset  (see  Grey.  1977). 

With  the  exception  of  the  woodwind  (flute)  and  harpsichord  (both  noted  as  unnatural  stimuli  in  Experiments  1  and  2), 
the  synthetic  instrument  samples  appear  to  have  been  adequate  samples  of  their  natural  instrument  counterparts.  Table  6 
verifies  this  assertion  by  providing  mean  similarity  ratings  for  comparisons  of  synthetic  and  natural  instrument  counterparts. 
Although  the  synthetic  and  corresponding  natural  tokens  were  generally  very  similar,  the  finding  that  they  were  not  maximally 
similar  to  each  other  suggests  that  there  aie  at  least  some  subtle,  but  perceivable  differences  between  the  two  types  of  stimuli. 
This  finding  indicates  that  the  results  from  Experiments  1  and  2  are  reasonably  valid,  but  also  support  our  rationale  for  using 
natural  instruments  in  the  remainder  of  this  and  in  the  last  experiment. 


Insert  Tabic  6  Here 


Condition  2  (natural  intact  vs.  natural  cut-attack)  resulted  in  a  2-dimensional  solution  (stress  value  of  0.068)  which  is 
shown  in  panel  B  of  Figure  2.  Each  cut-attack  stimulus  was  highly  similar  to  the  intact  version  from  which  it  was  derived.  In 
fact,  the  trumpet  (Tn  and  Tc)  stimuli  were  maximally  similar  to  each  other.  Based  upon  the  position  of  the  groupings  in  the 
perceptual  space.  Dimension  1  appeared  *  be  related  to  resonant  properties  of  the  instruments,  while  dimension  2  was  related  to 
degree  of  spectral  fluctuation  (both  summarized  above). 

It  did  not  appear  that  the  cut-attack  manipulation  of  the  stimuli  had  any  significant  effect  on  similarity  compared  to 
their  intact  counterparts.  Panel  A  of  Figure  2  (which  illustrates  the  MDS  space  for  the  intact  stimuli)  and  panel  B  of  Figure  2 
(which  depicts  the  MDS  space  for  intact  and  cut-attack  stimuli)  show  approximately  equivalent  similarity  spaces  for  the  intact 
and  cut-attack  tokens,  with  the  elimination  of  the  attack  not  meaningfully  affecting  the  similarity  of  the  altered  stimulus  to  its 
original  counterpart.  This  finding  is  contrary  to  the  conclusion  by  Grey  that  onset  functions  are  important  in  instrument 
identification.  Although  it  is  possible  that  some  aspect  of  the  synthetic  nature  of  Grey’s  stimuli  may  have  contributed  to  the 
greater  importance  of  attack  noted  in  his  study,  it  is  unlikely  given  that  his  stimuli  were  based  directly  on  natural  tokens  using 
an  analysis-by-synthesis  approach.  A  more  likely  reason  for  this  difference  in  results  could  be  related  to  the  lengths  of  the 
stimuli,  as  suggested  by  Handel  (1989)  Grey’s  stimuli,  which  were  synthetic  single  tones,  ranged  from  250-500  ms,  whereas  in 
the  current  study  the  natural  chord  stimuli  were  approximately  870  ms.  The  onset  transitions  in  the  current  study  thus  could 
have  constituted  a  much  less  significant  portion  of  the  stimuli  than  those  used  in  the  Grey  study.  If  this  conjecture  is  valid, 
then  removal  of  the  attack  functions  should  lead  to  significant  decrements  in  instrument  identification  for  shorter  stimuli.  We 
will  address  this  argument  more  completely  in  the  general  discussion. 

Condition  3  (natural  intact  vs.  natural  filtered)  also  resulted  in  a  2-dimensional  solution  (stress  value  of  0.048).  Panel 
C  of  Figure  2  provides  the  MDS  space  for  the  filtered  and  intact  stimuli  based  on  dimensions  1  and  2.  Dimension  1  was 
related  to  the  prcsrnce/ahsence  of  higher  overtones  of  the  tokens;  the  filtered  stimuli  were  all  highly  similar  to  each  other  and 
minimally  similar  to  their  intact  versions.  Dimension  2  was  related  to  the  resonant  properties  of  the  instruments  (as  discussed 
above).  Based  upon  spectrograms  of  the  intact  tokens  of  the  flute  stimuli,  which  verified  that  the  flute  is  characterized  by  very 
weak  higher  harmonics,  the  filtering  procedure  was  not  expected  to  (and  did  not)  have  a  dramatic  effect  on  timbre. 

Experiment  4 

In  order  to  better  establish  the  relationship^)  between  listener  expectations  and  physical  characteristics  of  stimuli. 
Expenment  4  used  the  intact  and  the  two  sets  of  physically-altered  stimuli  in  a  major/minor  chord  discrimination  task.  If  some 
type  of  normalization  process  (that  is  l>ased  upon  adjusting  for  differences  between  two  stimuli)  is  indeed  invoked,  instrument 
comparisons  that  resulted  in  low  similarity  ratings  should  result  in  increased  reaction  times.  Conversely,  instrument  comparisons 
that  were  judged  to  be  very  similar  should  result  in  a  nvnimal  increase  in  S/N  and  thus,  a  minimal  need  for  normalization,  as 
demonstrated  by  faster  reaction  times.  TTic  results  of  Experiment  3  will  directly  evaluate  the  roles  of  the  described  glotial 
timbral  properties  (derived  in  Experiment  3)  in  normalization. 

Method 

Subjects.  Twelve  subjects  from  B:.:^ha  iton  University,  each  with  at  least  5  years  of  music  experience,  served  as 
subjects  for  this  experiment.  All  subjects  reported  normal  hearing  and  were  paid  $5  per  hour  for  their  participation. 

Stimuli  and  Procedure.  C-major  and  (.’-minor  chords  were  used  as  stimuli.  Experiment  4  used  a  chord  discrimination 
task  similar  to  the  one  used  m  Experiment  1.  Full,  intact  stimuli  were  paired  with  full,  cut-attack,  and  filtered  stimuli  from  the 
same  instrument  or  from  different  instruments.  In  order  to  limit  both  the  numlicr  of  total  tnals  *nd  stimulus  uncertainty,  the 
first  chord  presented  was  always  a  (.’-major  chord  'Hus  standard  stimulus  was  followed,  after  a  1500  ms  I  SI.  by  either  a  C- 
major  or  (‘-minor  chord  The  experiment  consisted  of  720  trials  with  Grief  rest  periods  provided  Ixrlwccn  each  block  of  120 
trials  Subjects  were  instructed  to  respond  as  quickly  and  accurately  as  possible.  A  maximum  of  3000  ms  was  allowed  to  press 
of  two  keys,  corresponding  to  judgments  of  "same"  and  "different'  chord,  on  a  response*  pad  All  responses  were  recorded 


one 


Timbre  Normalization 


9 


on-line  by  a  computer  that  measured  RT  with  1-ms  accuracy. 

Results  and  Discussion 

Subjects  performed  at  high  levels  of  accuracy,  averaging  94  percent  correct.  RcaUion  limes  were  obtained  from 
individual  median  scores  for  correct  responses  only.  A  linear  regression  analysis  was  performed  on  the  scaling  data  obtained 
from  Experiment  3  and  reaction  times  for  the  corresponding  stimulus  pairs  from  Experiment  4  to  demonstrate  possible 
systematic  changes  in  RT  as  a  function  of  stimulus  similarity.  The  linear  regression  line  resulted  in  an  rJ  of  0.532,  and  indicated 
that  as  similarity  increases  there  is  a  corresponding,  and  fairly  consistent,  decrease  in  RT.  It  appears  that  this  significant 
correlation  would  have  been  higher  were  it  not  for  two  outlying  scores.  Both  cases  involved  comparisons  of  a  full  stimulus 
followed  by  a  cut-attack  stimulus.  Reaction  time  for  these  comparisons  were  faster  than  what  was  predicted.  Since  these  two 
comparisons  were  different  instrument  conditions,  slower  reaction  times  had  been  predicted.  It  is  suspected  tbit  the  cut-attack 
manipulation,  when  presented  as  the  second  stimulus,  may  have  speeded  RT  to  some  extent  because  peak  amplitude  would  be 
reached  at  a  much  earlier  point  for  these  stimuli.  Thus,  critical  information  necessary  to  identify  a  certain  timbre  may  have 
been  obtained  earlier  by  the  listener. 

To  investigate  the  possibility  that  subjects  were  able  to  respond  more  quickly  to  the  cut-attack  stir  ,  another  brief, 
additional  chord  judgment  experiment  (similar  to  Experiment  4)  was  conducted,  using  only  the  full  and  cut-attack  stimuli  and 
manipulating  the  order  of  presentation  On  any  given  trial  the  cut-attack  stimulus  could  be  presented  as  the  first  or  second 
stimulus.  As  suspected,  a  one-way  ANOVA  revealed  that  there  was  a  significant  predicted  effect  of  order  of  presentation 
(F(3,4)  =  6.28,  g  <  0.01  j.  Based  on  this  information,  a  second  lioear  regression  was  computed  using  the  original  data,  but 
omitting  the  two  conditions  where  a  cut-attack  instrument  token  was  presented  as  the  first  stimulus.  Figure  3  shows  the  results 
of  the  second  regression,  where  the  new  value  of  r*  (0.689)  is  indeed  higher  than  that  obtained  in  the  first  regression. 


Insert  Figure  3  Here 


The  results  of  Experiment  4  demonstrate  that  timbral  similarity  of  two  items  is  an  important  predictor  of  processing 
time  for  normalization.  Furthermore,  timbre  is  primarily  dependent  on  the  information  that  is  present  in  the  upper  harmonics  of 
instrument  tokens  (Experiment  3).  Increases  in  processing  time  arc  reflective  of  the  degree  to  which  timbral  differences  between 
stimuli  are  factored-out.  Highly  similar  tokens  are  processed  faster  since  there  is  less  variability  to  adjust.  Correspondingly, 
additional  time  is  required  to  normalize  greater  stimulus  variability. 

General  Discussion 

The  current  investigation  has  shown  that  normalization  processes  for  task-irTelcvant  source  variability  arc  not  unique  to 
speech.  Thus,  the  present  nonspeech  finding  of  timbre  normalization  in  chord  identification  suggests  that  normalization  may 
reflect  a  general  auditory  perceptual  mechanism.  Wc  note  from  experience  (e.g..  Hall  &  Pastorc,  1992)  the  difficulty  in 
providing  a  strong  empirical  evaluation  of  claims  concerning  whether  of  not  speech  is  mediated  by  a  specialized,  biological 
mechanism  for  processing  speech  (Liberman  &  Mattingly.  1985.  1989;  Whalen  &  Liberman,  1987).  We  therefore  simply 
acknowledge  attempts  of  previous  normalization  studies  (Pisoni,  Carrel  1.  &  Cans,  1983)  to  address  that  issue,  and  focus  our 
discussion  on  implications  of  normalization  for  the  nature  of  perceptual  processing. 

Implications  for  Perceptual  Processing 

The  perceptual  system  appears  to  have  the  ability  to  factor  out.  or  at  least  partition,  information  associated  with 
irrelevant  features  prior  to  complete  identification  of  the  relevant  features.  By  providing  a  demonstration  of  such  partitioning  of 
information,  the  current  senes  of  experiments  provides  some  important  insights  into  general  aspects  of  perceptual  processing, 
particularly  in  terms  of  attention  to  specific  features. 

Experiment  1  showed  that  variation  in  timbre  results  in  significant  performance  decrements  for  both  accuracy  and  RT. 
These  results  may  reflect  a  failure  on  the  part  of  most  subjects  to  effectively  normalize,  with  only  two  highly  practiced  musicians 
demonstrating  a  cost  to  RT  without  a  decrement  in  accuracy  when  instrument  was  varied  (see  "Effects  of  Music  Experience", 
below).  Thus,  normalization  may  reflect  the  use  of  acquired  knowledge  in  an  efficient,  possibly  automatic  fashion.  Experiment 
2  demonstrated  that  there  is  an  active,  anticipatory  component  to  normalization  rather  than  being  a  phenomenon  solely  based 
upon  a  passive  response  to  increased  stimulus  variability  (or  decreased  S/N  ratio)  In  this  active  process,  it  appears  that  the 
perceptual  system  docs  not  simply  evaluate  each  attribute  by  combining  the  values  of  the  limited  set  of  relevant  features,  but 
rather  seems  to  actively  engage  first  in  setting  some  processing  parameters  (based  on  expected  stimulus  properties),  and  then 
evaluating  the  adequacy  of  the  setting.  If  the  settings  are  incorrect,  the  system  can  modify  the  setting,  but  with  a  loss  of  time 
This  loss  of  time  is  reflected  in  a  greater  cost  for  an  invalid  cue  than  an  advantage  for  a  valid  (relative  to  a  neutral)  cue:  this 
asymmetry  of  co6t  to  benefit  has  been  reported  often  by  Posner  (1980)  for  visual  stimulus  processing.  Therefore,  normalization 
has  been  demonstrated  to  be  an  active,  adaptive  type  of  stimulus  processing.  By  the  same  token,  it  may  be  reasonable  to 
conjecture  that  limits  of  time  or  processing  capacity  would  result  in  decreases  in  accuracy  resulting  from  the  perceptual  system 
resetting  processing  parameters  or  utilizing  a  more  global  processing  strategy,  either  of  which  should  result  in  an  increase  in  N. 
and  thus  a  decrease  in  S/N  ratio. 

The  perceptual  system  may  always  perform  a  survey  of  the  stimuli,  and  determine  the  appropriateness  of  the  existing 
setting/algonthm  (from  a  previous  trial  or  from  a  cue)  If  the  system  has  determined  the  necessity  to  change  the  normalization 
algorithm,  there  may  lie  a  cost  in  time  and  processing  resources.  Prior  to  activation  of  the  appropriate  algorithm,  there  must  Ik' 
a  disengagement  of  any  inappropriate  algorithm,  i Tie  notion  and  cost  of  a  disengagement  prixrcvs  also  has  l>ccn  dcscrilK’d  in  the 
visual  attention  literature  by  Posner  (1980.  also  see  Experiment  2) 


ft 


Timbre  Normalization 


10 


Conceptualizing  normalization  as  a  central  form  of  adaptive  processing  provides  some  possible  accounts  for  the 
significant  RT  cost  for  invalid  cues  in  Experiment  2.  First,  in  the  initial  appraisal  of  the  stimuli,  the  system  should  lx:  able  to 
perceive  dissimilar  stimuli  more  quickly,  activating  a  faster  change  in  setting,  and  resulting  in  a  minimization  of  response  lime 
However,  our  empirical  results  refute  this  possibility.  Second,  for  highly  similar  stimuli,  the  system  may  retain  the  existing 
setting  (i.e.,  there  is  no  normalization),  with  some  added  noise  due  to  even  small  differences  in  the  stimuli.  In  this  case,  slower 
response  times  and  decreased  accuracy  would  be  expected,  however,  the  systematic  changes  in  R  T  as  a  function  of  similarity  that 
were  found  would  not  be  predicted.  Third,  the  time  to  disengage  an  algorithm  should  not  differ  as  a  function  of  similarity. 

This  possibility  is  also  refuted  by  the  results  of  Experiment  4,  where  reaction  time  is  inversely  related  to  similarity.  Thus,  the 
most  plausible  explanation  of  the  normalization  process  is  that  time/effort  for  adjustment  to  an  appropriate  algorithm  or  setting 
is  a  function  of  similarity.  Perceptually  similar  tokens  are  processed  faster  since  there  is  less  variability  to  partition  or  factor 
out  Conversely,  greater  time  and  effort  is  required  to  normalize  larger  stimulus  variability. 

Effects  of  Music  Experience 

The  degree  to  which  algorithms  are  effectively  utilized  during  normalization  seems  to  be  a  function  of  the  amount  of 
experience  with  a  particular  class  of  stimuli.  For  example,  performance  was  faster  and  more  accurate  in  conditions  that  included 
a  piano  stimulus,  and  many  of  our  subjects  were  pianists.  Possessing  greater  formal  knowledge  of  the  theoretical 
structure/underpinnings  on  which  the  stimuli  are  based  may  be  an  important  factor  in  stimulus  processing  in  umbre 
normalization. 

Highly  practiced  subjects  should  have  available  efficient  algorithms  to  normalize  for  the  effects  of  instrument 
variability.  A  highly  related  alternative  to  this  conceptualization  is  that  musicians  may  use  different  types  of  knowledge  (that 
nonmusicians  may  not  possess)  to  invoke  imagery  or  schema-based  representations  in  performing  certain  tasks.  Most  subjects  in 
Experiment  l  tended  to  have  limited  musical  experience.  These  subjects  thus  may  have  had  relatively  inefficient,  and  possibly 
inaccurate,  normalization  algorithms  that  probably  functioned  more  on  immediate  experience  with  stimuli  than  on  knowledge  of 
chords  and  instruments.  Following  this  conceptualization,  speed  and  accuracy  could  have  been  superior  in  the  Single  Instrument 
oonditioa  because  the  subjects  applied  the  same  (possibly  incomplete  and  inaccurate)  normalization  algorithm  to  the  A  and  X 
stimuli,  resulting  in  equivalent  normalization  errors  for  A  and  X  stimuli.  Those  same  errors  would  not  have  been  equivalent  for 
A  and  X  stimuli  in  the  Mixed  Instrument  condition,  where  different  algorithms  must  be  applied  to  stimuli.  The  result  would  be 
an  increase  in  the  perception  of  difference*  in  the  pitch  attribute,  and  the  observed  increase  in  errors  in  the  Mixed  Instrument 
condition. 

The  normalization  process  might  well  involve  some  form  of  auditory  imagery  or  schema.  For  example.  Subject  1 
reported  anticipating  a  rapid  stimulus  onset  when  cued  for  the  piano.  Subject  5  (Experiment  2)  also  reported  attempting  to 
anticipate  the  complete  chord  played  by  the  cued  instrument  These  anticipations  are  consistent  with  the  use  of  some  form  of 
imagery,  and  may  reflect  the  typical  nature  of  expectancies  for  at  least  some  subjects  in  the  music  normalization  process. 

An  excellent  study  by  Crowder  (1989)  on  auditory  imagery  for  timbre  used  the  same  basic  strategy  as  our  first  two 
experiments,  but  with  somewhat  different  goals  and  results.  Experiment  1  of  the  Crowder  study  attempted  to  demonstrate  the 
effects  of  instrument  variability  on  pitch  judgments  for  single  tones,  establishing  a  basis  for  demonstrating  positive  and  negative 
effects  of  imagery  in  the  second  experiment.  Crowder  obtained  a  main  effect  of  instrument  (with  significantly  faster  RTs  on 
same  instrument  trials),  thus,  in  effect,  demonstrating  normalization.  However,  these  normalization  results  were  limited  by  a 
significant  instrument  by  pitch  interaction,  such  that  the  main  effect  of  instrument  was  observed  only  on  same-pilch  trials.  Thus, 
as  with  our  normalization  results  (eg.,  Experiment  1),  instrument  variability  resulted  in  increased  response  latency.  Crowder  also 
had  some  problems  with  subject  accuracy,  as  wc  did  in  our  Experiment  1.  Data  from  3  subjects  in  the  Crowder  study  (who 
performed  significantly  below  chance)  were  discarded  in  order  to  obtain  the  predicted  effects  of  timbre  variability. 

In  his  Experiment  2.  Crowder  used  a  cuing  technique  that  also  involved  a  self-paced  AX  task  (the  fixed  trial  structure 
in  our  Experiment  2  used  a  single  interval  identification  task  with  a  visual  cue).  The  subjects  were  instructed  to  imagine  the 
presented  sine  tone  being  played  by  a  certain  instrument.  After  the  subjects  had  indicated  the  formation  of  an  image,  an 
instrument  tone  was  presented,  whereupon  a  "same/different  instrument’  judgment  was  made.  Crowder  obtained  a  similar 
interaction  of  pitch  and  imagined  timbre  to  that  found  in  his  first  experiment,  with  the  expected  effect  of  instrument  variability 
observed  onty  for  same-pitch  trials.  Thus,  it  appears  that  subjects  can  actively  image  timbre,  with  processing  costs  (slower  RT) 
for  (imagined)  properties  that  are  not  consistent  with  subsequently  presented  stimuli.  These  results  arc  compatible  with  the 
normalization  findings  in  our  Experiment  2.  where  subjects  appeared  to  actively  engage  in  setting  parameters  based  on  expected 
stimulus  properties.  This,  although  our  subjects  in  Experiment  2  were  not  specifically  instructed  to  use  auditory  imagery,  our 
results  are  consistent  with  evidence  for  the  use  of  timbral  imagery.  In  contrast  to  our  Experiment  2,  the  subjects  in  Experiment 
1  need  not  have  actively  generated  from  memory  an  internal  representation  of  an  expected  auditory  stimulus,  bm  rather  may 
have  compared  a  trace  (or  echoic  image)  of  the  first  stimulus  with  the  second  stimulus. 

Timbral  Contributions  of  Spectral  Characteristics  and  Attack 

Experiment  3  demonstrated  that  upper  harmonics,  but  not  attack  functions,  play  a  significant  role  in  the  timbral 
characteristics  of  instruments,  and  thus  should  l»c  most  subject  to  normalization  processes  in  the  discrimination  of  chords. 
Similarly.  Pitt  and  Crowder  (1992)  demonstrated  an  inability  to  image  rise  time  (loudness),  a  salient  component  of  attack 
functions,  and  concluded  that  timbral  imagery  is  based  primarily  on  spectral  properties. 

Ihc  role  of  dynamic  onset  properties  in  timbre  representation  may  lie  a  function  of  the  length  ol  the  stimulus  (e  g  . 
see  Handel.  1989).  For  example,  in  order  for  the  attack  functions  to  influence  performance  in  any  AX  task  (as  in  our 
I  xperimenis  1.  3.  and  4).  subjects  must  compare  the  X  stimulus  attack  with  some  form  of  representation  of  the  A  stimulus 


( 


Timbre  Normalization 


11 


attack.  This  representation  could  be  a  trace  (or  type  of  echoic  image),  or.  alternatively,  an  encoded  version  of  the  attack. 

These  possible  representations  are  respectively  equivalent  to  what  have  been  called  "trace  coding"  and  "context  coding"  processes 
(e  g.,  Macmillan.  Braida.  and  Goldberg.  1987). 

A  trace  memory  will  decay  rapidly  over  time.  The  vast  majority  of  estimates  suggest  that  trace  decay  should  be 
complete  within  1-2  s  (e  g.,  Darwin.  Turvey.  and  Crowder.  1972;  Treisman  and  Rostron,  1972).  As  1SI  approaches  or  exceeds 
this  1-2  s  limit,  context  coding  becomes  the  more  viable  strategy. 

Let  us  temporarily  assume  that  our  approximately  1  second  stimuli  include  a  100  ms  attack  portion.  With  an  ISl  of 
1.5  s,  the  functional  delay  for  comparing  the  attack  function  of  A  and  X  stimuli  therefore  becomes  |(ls-100  ms)  +■  1.5  s).  or  2.4 
s.  Thus,  even  in  the  absence  of  masking  from  the  final  portion  of  the  A  stimulus  (which  also  may  occur),  an  adequate  trace  of 
the  attack  function  of  the  A  stimulus  will  no  longer  be  available  for  comparison  with  the  X  stimulus.  The  only  attack 
information  for  the  A  stimulus  thus  must  be  some  form  of  context  coding. 

There  also  are  several  reasons  to  expect  that  context  coding  of  attack  information  be  poor  in  longer  stimuli.  For 
example,  if  context  coding  is  capacity  limited,  then  encoding  should  be  best  for  simpler,  more  salient,  stimulus  properties.  In 
longer  stimuli,  like  those  used  in  the  current  experiments,  long-lasting,  static  information  is  (generally  speaking)  consistently 
available  after  a  relatively  brief  and  more  complex  attack.  Thus,  following  the  logic  based  on  limited  capacity,  static  properties 
(rather  than  attack  properties)  should  be  encoded.1  Furthermore,  if  context  coding  requires  processing  time  to  access  memories 
for  particular  stimuli,  then  encoding  of  stimuli  must  be  weak  in  instances  where  there  is  an  adequate  trace  (as  demonstrated 
above  for  the  attacks  of  the  current  stimuli). 

As  a  result  of  both  trace  and  context  coding,  attacks  should  have  little  influence  on  comparisons  of  timbres  given  long 
stimuli  and/or  IS!.  It  then  should  not  be  surprising  that  eliminating  attacks  had  little  influence  on  similarity  judgments  in 
Experiment  3,  and  the  normalization  task  of  Experiment  4.  In  summary,  attack  functions  can  easily  be  argued  to  be  more 
salient  features  of  timbre  in  shorter  stimuli  (as  opposed  to  those  used  in  the  current  experiments),  where  attacks  are  better 
represented  in  (trace)  memory. 

Identification  of  the  physical  properties  relevant  to  normalization  may  provide  important  implications  in  understanding 
how  the  perceptual  system  processes  auditory  information.  In  a  commentary  on  talker  normalization.  Pisoni  (1990)  indicated  that 
the  inability  to  identify  the  nature  of  source  variability  has  hindered  researchers  from  making  significant  advances  in  solving  the 
problem  of  mapping  invariant  attributes  of  the  physical  signal  onto  abstract  linguistic  units.  Experiments  3  and  4  address  this 
concern  in  two  ways  for  music  stimuli.  First,  possible  sources  of  variability  were  identified  through  stimulus  alteration  whose 
perceptual  relevance  was  evaluated  using  similarity  scaling  procedures.  Experiment  4  then  provided  evidence  that  the  overtones 
were  the  timbrai  components  most  subject  to  normalization.  Second,  a  possible  relationship  between  perceived  stimulus  similarity 
and  reaction  time  was  obtained  that  indicated  additional  time  was  required  to  process  signals  that  were  judged  to  be  more 
dissimilar  This  increase  in  reaction  time  for  dissimilar  items  seems  to  reflect  the  degree  of  adjustment  that  is  necessary  for  the 
system  to  "correct"  for  inappropriate  expectancies.  We  do  realize  that  the  physical  alterations  performed  on  our  stimuli  were 
rather  extreme,  and  it  thus  is  possible  that  there  may  be  other,  more  subtle  sources  of  spectral  variability  contributing  to  timbre 
and  normalization,  such  as  detailed  aspects  of  upper  partials  (c.g..  intensity  patterns,  decay  properties). 

Conclusions 

In  addition  to  providing  further  insight  about  normalization,  the  present  study  has  important  implications  in  the 
auditory  attention  domain.  One  new  way  to  characterize  normalization  is  as  a  manipulation  of  a  listener’s  attention  to  stimulus 
features.  When  attending  to  inappropriate  stimulus  properties,  the  listener  must  redirect  attention  to  appropriate  settings  before 
the  relevant  processing  can  occur.  The  present  findings  of  active  normalization  are  consistent  with  selective  attention  processes, 
where  the  perceptual  system  is  able  to  set  up  to  receive  and  process  certain  expected  stimulus  properties. 

Tbe  present  investigation  has  shown  that  normalization  can  be  used  as  an  important  tool  in  identifying  and  defining 
critical  auditory  features  utilized  m  signal  perception.  Experiment  4  demonstrated  a  high  correlation  between  judged  similarity 
and  the  critical  parameters  used  in  processing  music  timbre.  Future  research  in  normalization,  whether  for  music  or  for  speech, 
should  not  be  limited  to  only  demonstrating  different  types  of  normalization,  but  instead  should  focus  on  determining  bases  of 
listener  expectations  and  their  relations  to  physical  characteristics  of  the  signal.  The  nature  of  normalization  then  will  be  better 
established,  as  will  a  reason  for  why  the  human  auditory  system  sometimes  cannot  ignore  certain  task -irrelevant  properties  of  the 
signal. 


» 


Timbre  Normalization 


12 


References 

Allard.  F.  (1976).  Physical  and  name  codes  in  auditory  memory.  Quarterly  Journal  of  Experimental  Psychology.  28, 
475-482. 

American  Standards  Association  (1960).  American  Standard  Acoustical  Terminology.  New  York. 

Beal.  A.L.  (1985).  The  skill  of  recognizing  musical  structures.  Memory  &  Cognition,  13.  405-412. 

Crowder,  R.G.  (1989).  Imagery  for  musical  timbre.  Journal  of  Experimental  Psychology:  Human  Perception  and 
Performance.  IS.  472-478. 

Darwin.  CJ..  Turvey.  MX.  A  Crowder.  R.G.  (1972).  An  auditory  analogue  of  the  Sperling  partial  report  procedure: 
Evidence  for  brief  auditory  storage.  Cognitive  Psychology.  3,  255-  267. 

Fletcher,  N.H.  (1991).  The  Physics  of  Musical  Instruments.  New  York:  Springcr-Vcrlag. 

Gamer,  W.R.  (1974).  The  Processing  of  Information  and  Structure.  NY:  Wiley. 

Goldinger,  S.D.  (1992).  Words  and  voices:  implicit  and  explicit  memory  for  spoken  words.  Research  on  Speech 
Perception:  Progress  Report  No.  7  (Indiana  University).  1-128. 

Grey.  J.M.  (1977).  Multidimensional  perceptual  scaling  of  musical  timbres.  Journal  of  the  Acoustical  Society_of 
America,  M.  1493-1500. 

Grey.  J.M.,  A  Moor er.  J.A.  (1977).  Perceptual  evaluations  of  synthesized  musical  instrument  tones.  Journal  of_the 
Acoustical  Society  of  America.  62.  454-462. 

Hall.  M.D..  A  Pastore,  R.E.  (1992).  Musical  duplex  perception:  perception  of  figurally  good  chords  with  subliminal 

distinguishing  tones.  Journal  of  Experimental  Psychology:  Human  Perception  and  Performance.  18,  752-762. 

Hall,  M.D.,  A  Pastore,  R.E.  (1993).  An  auditory  analogue  to  feature  integration.  Published  Program  for  the  34th 
Annual  Meeting  of  the  Psvchonomic  Society.  16  (Abstract  #174). 

Handel.  S.  (1989).  Listening.  Cambridge:  MIT  Press. 

Johnson.  K.  (1988).  Intonationa!  context  and  F0  normalization.  Research  on  Speech  Perception:  Progress  Report  No. 

14  (Indiana  LJniversitv).  81-108. 

Jusczyk.  P.W..  Pisoni,  D.B..  A  Mullennix,  J.W.  (1989).  Effects  of  talker  variability  on  speech  perception  by  2-month 
old  infants.  Research  on  Speech  Perception:  Progress  Report  No.  15  (Indiana  University).  133-161. 

Krumhansl,  C.L..  A  Iverson,  P.  (1992).  Perceptual  interactions  between  musical  pitch  and  timbre.  Journal  of 
Experimental  Psychology:  Human  Perception  A  Performance.  18.  739-751. 

Liberman,  A  M..  A  Mattingly.  LG.  (1985).  Motor  theory  of  speech  perception  revisited.  Cognition.  2\ .  1-36. 

Liberman,  A.M .  A  Mattingly.  LG.  (1989).  A  specialization  for  speech  perception.  Science.  243.  489-494. 

logan.  RJ.  (1990).  Talker  normalization  and  speaker  recognition  by  humans:  one  mechanism  or  two?  Unpublished 
doctoral  dissertation.  SUNY-Binghamton.  Binghamton.  N.Y. 

Macmillan.  N.A.,  Braida,  L.D..  &  Goldberg.  R.F.  (1987).  Central  and  peripheral  processes  in  the  perception  of  speech 
and  nonspeech  sounds.  In  M.E.H.  Schouten  (Ed.).  The  Psychophysics  of  Speech  Perception  (pp.  28-45). 
Dordecht,  the  Netherlands:  Martinus  Nijhoff. 

Mullennix,  J.W.,  A  Pisoni,  D.B.  (1989).  Detailing  the  nature  of  talker  variability  effects  in  speech  perception  Journal 
of  the  Acoustical  Society  of  America.  85.  YY14. 

Mullennix,  J.W..  Pisoni,  D.B.,  &  Martin,  C.S.  (1989).  Some  effects  of  talker  variability  on  spoken  word  recognition. 
Journal  of  the  Acoustical  Society  of  America.  85,  365-378. 

Nushaum.  H.C.,  &  Morin.  'EM.  (1989).  Perceptual  normalization  of  talker  differences.  Journal  of  the  Acoustical 
Society  of  America.  85.  S125. 

Pastore.  R.E..  A  Scheirer,  CJ.  (1974).  Signal  detection  theory:  Considerations  for  general  applications.  Psychological 
Bulletin.  8L  945-958. 

Pisoni.  D.B.  (1990).  Comments  on  talker  normalization  in  speech  perception.  Research  on  Speech  Perception:  Progress 
Report  No.  16  (Indiana  University).  413-422. 

Pisoni.  D.B..  Carrel  I,  T.D..  A  Gans.  SJ.  (1983).  Perception  of  the  duration  of  rapid  spectrum  changes  in  speech  and 
nonspcech  signals.  Perception  A  Psychophysics.  34,  314-322 

Pitt.  VI  A.  (udder  review).  Individual  differences  in  the  perception  of  pitch  and  timbre  Perception  A  Psychophysics. 

Pitt.  M.A..  A  Crowder.  R.G.  (1992).  The  role  of  spectral  and  dynamic  cues  in  imagery  for  musical  timbre.  Journal  of 
Experimental  Psychology:  Human  Perception  A  Performance.  18.  728-738. 

Plomp.  R  (1976).  Aspects  of  tone  sensation.  New  York:  Academic  Press. 

Posner.  M  l  (1980).  Orienting  of  attention.  Quarterly  Journal  of  Experimental  Psychology.  32.  3-25. 

Posner.  M  l.,  Snyder.  C.R.,  A  Davidson,  BJ.  (1980).  Attention  detection  signals.  Journal  of  Experimental  Psychology- 
General.  2.  160-174 

Saldanha.  E.L..  A  Cx> rso,  J.F.  (1964).  Timbre  cues  and  the  identification  of  musical  instruments.  Journal  of  the 
Acoustical  Society  of  America.  36.  2021-2026. 

Summcrfield.  A.Q..  A  Haggard.  M.P.  (1975).  Vocal  tract  normalization  as  demonstrated  by  reaction  times  In  G  Earn 
and  N1.A.  A  lath  am  (Eds.)  Auditory  Analysis  and  Perception  of  Speech  London.  Academic  Press 

Verhrngge.  K  K  ,  Strange.  W  .  Shankwcilcr.  D  P..  A  Edman.  i.K.  (1976).  What  information  enables  a  listener  to  map  a 
talkers  vowel  spare?  Journal  of  the  Acoustical  Society  of  America.  60  19S-212 

Whalen.  I)  .  A  I  ilierman.  AM  (1987)  Speech  perception  lakes  precedence  over  nonspecch  perception  Science.  237. 
169-171 


Timbre  Normalization 


13 


» 


Wolport.  R.S.  (1990)  Recognition  of  melody,  harmonic  accompaniment,  and  instrumentation:  Musicians  vs. 
nonmusicians.  Music  Perception.  8,  95-106. 


Acknowledgments 

This  research  was  supported  by  grant  F496209310033  from  the  Air  Force  Office  of  Scientific  Research  and  grant 
BNS891 1456  from  the  National  Science  Foundation.  Opinions,  findings,  conclusions,  and  recommendations  are  the  authors'  and 
do  not  necessarily  reflect  views  of  the  granting  agencies. 


Footnotes 

1.  Because  the  octave  location  on  the  synuicsuxr  corresponded  to  a  diffcrcm  frequency  range  for  the  woodwinds,  the  chords 
produced  by  the  woodwinds  were  lower  in  frequency  (approximately  one  octave)  compared  to  the  other  instruments. 

2.  Context  coding  should  be  richer  given  extensive  experience  with  stimuli.  Thus,  listeners  with  greater  music  experience  could 
be  conjectured  to  additionally  have  adequate  encoding  of  more  subtle  stimulus  properties,  like  the  attack  functions  of  our  stimuli. 


I 


» 


» 


» 


» 


» 


Timbre  Normalization 


14 


Table  1.  Mean  percent  correct  and  mean  RT  (plus  standard  error)  for  same  and  different  chord  trials  in  Experiment  1  for 
single  versus  mixed  instrument  conditions.  Results  for  different  chord  trials  are  further  partitioned  by  the  number  of  notes 
which  differ  between  compared  chords. 


SINGLE 

INSTRUMENT 

MIXED 

INSTRUMENT 

Chords 

SAME 

Accuracv 

98  2  (0.9) 

RT  in 

1069 

ms  Accuracv 

(36) 

RT  in  ms 
60.9 

0-6) 

1175 

(34) 

DIFFERENT 

88.5 

(3J> 

1210 

(50) 

66.6 

(14) 

1204 

(55) 

1-note: 

80.7 

(3-D 

1146 

(30) 

62.3 

(2-0) 

1199 

(23) 

2-notc: 

95.3 

0-6) 

1113 

(37) 

70.1 

(3-3) 

1199 

(32) 

3- note- 

98.0 

0.2) 

1060 

(47) 

74  3 

(3-5) 

1230 

(55) 

Table  2.  d'  and  RT  for  each  instrument  combination  across  all  AX  chord  discrimination  trials  (independent  of  note  differential 
for  different  trials)  in  Experiment  1,  including  overall  means  and  standard  errors. 


Instrument 

Effects  on  d' 

and  RT  (Experiment  1) 

Instrument 

Instrument 

RT 

RT 

A  Chord 

X  Chord 

d’ 

(Same  Chord) 

(Diff. Chord) 

Piano 

Piano 

4.25 

1021 

1087 

Brass 

1.16 

1172 

1322 

Woodwind 

0.60 

1433 

1270 

String 

1.55 

1190 

1299 

Harpsichord 

0.95 

1238 

1220 

Brass 

Piano 

1.81 

1173 

1243 

Brass 

4.64 

1088 

11S5 

Woodwind 

0.99 

1472 

1229 

String 

1.53 

1185 

1228 

l  larpsichord 

Lit 

1243 

1172 

Woodwind 

Piano 

0.67 

1193 

1207 

Brass 

0.50 

12S5 

1301 

Woodwind 

4.06 

1090 

114.3 

String 

0.47 

1318 

1325 

Harpsichord 

1.04 

1200 

1333 

String 

Piano 

1.44 

1104 

1214 

Brass 

1.46 

1238 

1279 

Woodwind 

0.69 

1401 

1263 

String 

4.37 

1129 

1147 

Harpsichord 

1.13 

1328 

1212 

Harpsichord 

Piano 

1.26 

1198 

1181 

Brass 

0.60 

1284 

1365 

Woodwind 

0.89 

1202 

1334 

String 

0.87 

1279 

1345 

1  larnsichord 

.3.81 

1040 

1070 

Mean 

1.67 

1219 

1237 

s.e. 

0.29 

54 

59 

» 


> 


» 


> 


i 


i 


Timbre  Normalization 


15 


Table  3.  RT  data  on  same  chord  trials  for  Subjects  7  and  13  (Experiment  1). 


Subject  7 

Chord 

Same  Instrument 

Different  Instrument 

C-Major 

853 

946 

C-Minor 

676 

964 

E‘-Major 

761 

1007 

E*-Minor 

797 

989 

Mean 

903.3 

1040.5 

St  Dev 

64.3 

23.3 

Subject  13 

C-Major 

953 

1127 

C-Minor 

1109 

1098 

Ek-Major 

1135 

1088 

E*-Minor 

942 

1105 

Mean 

1034.8 

1104.5 

St  Dev. 

87.8 

143 

Subieet  2 

C-Major 

840 

983 

C-Minor 

1015 

1054 

E‘- Major 

987 

922 

E‘-Mmor 

888 

971 

Mean 

932.5 

982.5 

St  Dev. 

966 

54.4 

Subject  21 

C-Major 

1032 

1100 

C-Minor 

1011 

1066 

E*-Major 

980 

1054 

E*-Mioor 

962 

1075 

Mean 

996.2 

1073.8 

St.  Dev. 

31.3 

19.5 

9 


9 


9 


9 


9 


Timbre  Normalization 


16 


Table  4.  Mean  RI 

'  and  standard  error 

data  across  subjects  for  each  cue-instrument  combination 

for  Experiment  2. 

Cue  Instrument 

Mean  RT  St. 

Error 

Cue 

Instrument 

Mean  RT 

St.  Error 

VALID  TRIALS: 

INVALID  TRIALS: 

P  P 

642.99 

83.17 

p 

B 

1120.88 

167.38 

B  B 

767.36 

92.72 

p 

S 

981.71 

136.27 

S  S 

758.19 

85.83 

p 

11 

802.52 

73.30 

II  II 

744.66 

81.32 

B 

P 

885.24 

147.29 

Mean  728.30 

B 

s 

1020.08 

139.24 

B 

II 

1029.34 

167-23 

NtOlKAL  1  RIALS: 

S 

p 

-03.77 

118.35 

X  P 

732.67 

116.86 

s 

B 

1153.19 

190.01 

X  B 

849.62 

91.89 

s 

11 

1044.03 

131.04 

X  S 

821.63 

98.56 

II 

P 

712.62 

89.19 

X  H 

755.14 

67.96 

H 

B 

950.21 

128.87 

Mean  789.76 

11 

S 

1021.16 

177.24 

Mean  972.89 

Table  5.  Decoding  of  symbols  used  in  Figures  2-4. 


Symbol 

Stimulus 

Symbol 

Stimulus 

Ps 

Synthetic  Piano 

Pn 

Natural  Piano 

Bs 

Synthetic  Brass 

Tn 

Natural  Trumpet 

Ws 

Synthetic  Woodwind 

Fn 

Natural  Flute 

Ss 

Synthetic  Strings 

Sn 

Natural  Strings 

Hs 

Synthetic  Harpsichord 

Hn 

Natural  Harpsichord 

Pc 

Cut-Attack  Piano 

Pf 

Filtered  Piano 

Tc 

Cut-Attack  Trumpet 

Tf 

Filtered  Trumpet 

Fc 

Cut-Attack  Flute 

Ff 

Filtered  Flute 

Sc 

Cut-Attack  Strings 

Sf 

Filtered  Strings 

He 

Cut-Attack  Harpsichord 

Hf 

Filtered  Harpsichord 

Table  6.  Mean  similarity  ratings  for  synthetic  versus  natural  instrument  comparisons. 

Instrument  ( lomoanson  Mean  Ratine 

Si.  Error 

Sn  -  Ss 

6.51 

0.14 

Pn  -  Ps 

6  64 

0.18 

Hn  -  11s 

5.76 

0.31 

Fn  -  Ws 

5.60 

0.34 

Tn  -  Bs 

6.31 

0.18 

» 


» 


Timbre  Normalization 


17 


Figure  Captions 

Figure  1.  Mean  reaction  times  for  individual  subjects  for  valid,  neutral,  and  invalid  visual,  instrument  cue  trials  in  Experiment  2. 

Figure  2.  (a)  Multidimensional  similarity  scaling  (dimensions  1  and  2)  for  individual  natural  and  synthetic  instruments  in 
Experiment  3,  Condition  1;  (b)  Multidimensional  similarity  scaling  (dimensions  1  and  2)  for  intact  natural  stimuli  and  cut-attack 
versions  in  Experiment  3,  Condition  2;  (c)  Multidimensional  similarity  scaling  (dimensions  1  and  2)  for  intact  natural  stimuli  and 
filtered  versions  in  Experiment  3.  Condition  3. 

Figure  3.  Linear  regression  of  mean  reaction  times  as  a  function  of  similarity  ratings  in  Experiment  4,  including  two  problem 
conditions. 


> 


Individual  Effects  of  Cue  Validity 
on  Reaction  Time 


Valid 

Neutral 

Invalid 


Dimension  2 


Dimension  2 


Reaction  Time  (ms.) 


Effects  of  Stimulus  Complexity  on  the 
Perceptual  Organization  of  Musical  Tones 


Michael  D.  Hall  and  Richard  E.  Pastore 
Center  for  Cognitive  and  Psycholinguistic  Sciences 
State  University  of  New  York  at  Binghamton 
Binghamton,  NY  13902-6000 

Abstract 

Duplex  perception  (DP)  occurs  when  one  stimulus  or  stimulus  component  contributes  simultaneously  to  two  distinct 
percepts.  Two  AX  discrimination  experiments  were  conducted  to  quantitatively  evaluate  the  effects  of  one  factor,  stimulus 
complexity  (by  manipulating  the  number  of  frequency  components  common  to  major  and  minor  chords),  which  would  be 
predicted  by  Gestalt  principles  of  perceptual  organization  to  affect  incidence  of  fusion  in  DP  stimuli.  Experiment  1 
demonstrated  frequent  fusion  of  bases  with  a  contralateral  distinguishing  tone.  Data  from  both  experiments  first  provide  some 
indirect  evidence  against  the  claim  made  by  some  supporters  of  speech  modularity  that  musical  DP  research  is  really 
demonstrating  triplex  perception  (i.e..  perception  of  base,  tone,  and  chord).  The  experiments  further  revealed  that  when 
major/minor  chords  are  presented  contralateral  to  a  different  distinguishing  tone,  chord  ear  perception  was  altered.  These 
alterations,  which  included  perceptual  migrations  and  fusion  of  contralateral  distinguishing  tones,  did  not  depend  on  stimulus 
position  within  a  trial  and  was  a  direct  function  of  stimulus  complexity.  The  results  are  discussed  in  terms  of  the  relationship 
between  stimulus  complexity  and  figural  goodness,  and  are  evaluated  as  possible  examples  of  stimulus  dominance  and  illusory 
feature  conjunctions. 


One  major  goal  of  auditory  perception  research  is  to  identify  general  principles  of  perceptual  organization.  This  goal 
frequently  has  taken  the  form  of  establishing  the  stimulus  variables  critical  to  the  perception  of  complex  stimuli.  Much  of  this 
literature  has  concentrated  on  laboratory  phenomena  involving  stimuli  which  consistently  give  rise  to  ambiguous  or  illusory 
percepts  (e  g.,  the  octave  illusion  (Deutsch,  1974)].  One  such  laboratory  phenomenon,  duplex  perception  (DP),  typically  has  been 
demonstrated  with  speech  stimuli.  In  DP  one  stimulus,  or  stimulus  component,  simultaneously  contributes  to  two  distinct 
percepts.  DP  is  claimed  to  represent  a  violation  of  the  rule  of  disjoint  allocation  (derived  from  the  Gestalt  principle  of 
belongingness)  which  states  that  one  stimulus/componcnt  can  or'y  contribute  to  one  perceptual  stream  (Bregman,  1987;  Mattingly 
A  Liberman.  1989;  but  see  Bregman.  1990). 

The  following  experiments  with  musical  stimuli  evaluated  one  factor  which  might  affect  the  incidence  of  fusion  in  DP. 
In  addition  to  addressing  an  existing  skepticism  about  the  validity  of  musical  DP.  the  results  of  the  experiments  are  consistent 
with  the  illusory  conjunction  of  auditory  features,  and  thus  raise  questions  about  the  role  of  attention  in  tonal  stimulus 
processing.  Before  reviewing  the  current  experiments,  a  brief  history  of  DP  research  will  be  presented,  including  a  brief 
evaluation  of  theoretical  issues  DP  has  been  used  to  address. 

DP  Phenomena  and  the  Claim  for  a  Phonetic  Module 

DP  was  first  demonstrated  by  physically  splitting  components  of  synthetic  versions  of  /da/  and  /ga/  syllables  which 
differ  in  place  of  articulation  (Rand.  1974).  A  common  form  of  DP  for  speech  (DPS)  entails  the  (dichotic)  presentation  of  the 
third  formant  (F3)  transition  and  the  remainder  of  the  syllabic  (or  base)  to  separate  ears  (e  g  ,  Mann,  Madden,  Russell,  and 
Liberman,  1981).  The  isolated  F3-transitions  are  perceived  as  chirps.  The  F3-transitions  also  distinguish  between  /da/  and  /ga/. 
with  bases  presented  in  isolation  often  only  heard  as  somewhat  ambiguous  or  neutral  syllables  (i.e.,  usually  not  consistently 
labelled  as  either  /da/  or  /ga/).  When  presented  dichoticaJly  at  normal  listening  levels,  the  result  is  two  simultaneous 
perceptions,  with  the  transition  playing  a  critical  role  in  each:  (1)  perception  of  the  transition  as  an  isolated  chirp  in  one  ear. 
and  (2)  perception  of  a  complete  /da/  or  /ga/  syllable  (base  plus  transition)  in  the  base  ear. 

DPS  is  cited  extensively  by  Liberman  and  colleagues  (e  g..  Liberman.  Isenberg.  and  Rakerd.  1981;  Liberman  and 
Mattingly.  1985.  1989a.  b;  Whalen  and  Liberman.  1987)  as  evidence  for  the  existence  of  a  specialized,  biologically  significant, 
phonetic  module  This  argument  presumes  that  DPS  percepts  are  the  result  of  two.  separate,  distinct  types  of  processing 
performed  on  the  transition  Chirp  perception  reflects  the  common  nonspeech  operation  of  a  general,  open,  auditory  module 
where  perception  corresponds  relatively  directly  to  the  physical  properties  of  the  signal  (pitch,  loudness,  and  timbre)  Processing 
by  the  closed  speech  module  instead  results  in  the  perception  of  a  C'V  syllabic  where  stimulus  and  perception  do  not  directly 
coincide  except  in  terms  of  phonetically  relevant  stimulus  properties. 

DP  replications  with  analogous  nonspcech  stimuli,  including  demonstrations  using  musical  tones  (Collins,  1985;  Hall  and 
Pastore.  1992:  Pastore.  Schmuckler.  Rosenblum.  and  Szczcsiul,  1983)  and  door  slamming  sounds  (Fowler  and  Roscnblum,  1990), 
have  questioned  DP-based  conclusions  for  modularity,  minimally  demonstrating  the  operation  of  other  auditory  modules  in  DP 
which  mirror  processing  by  the  speech  module.  We  believe  that  such  nonspecch  conditions  reveal  that  DP  findings  to  date  can 
lie  addressed  equally  well  from  modular  and  general  auditory  perspectives.  Musical  DP  was  first  obtained  by  Pastore,  et  al 
(1983.  and  later  replicated,  with  reduced  incidence,  by  Collins.  1985)  A  lone  distinguishing  a  major  from  minor  chord  (I.  or  Ek, 
which  is  presumed  to  play  a  role  analogous  to  the  transition  in  DPS)  was  dichoncally  presented  with  the  remainder  of  the  chord, 
the  corresponding  fifth  interval  (the  C-G  liase).  Many  musically  trained  subjects  reliably  identified  hearing  both  the  isolated 
lone  in  the  appropriate  car  and  a  complete,  fused,  major  or  minor  chord  in  the  Iwse  car  As  with  isolated  transitions  in  DPS. 
isolated  loncs  were  lalxdlcd  less  accurately  than  chords  perceived  by  integration  ol  tones  and  Imnc  Thus,  the  musical  base 


2 


Perceptual  Organization 


seems  to  act  as  a  harmonic  frame  of  reference  upon  which  judgments  on  the  distinguishing  lone  can  more  easily  l>c  made. 

Findings  of  musical  1)1*  have  t>een  criticized  by  phonetic  modularity  supporters  as  not  having  ruled  out  triplex,  rather 
than  duplex,  perception  (Mattingly  and  Liberman,  1988).  Triplex  perception  (TP)  presumes  that  subjects  accurately  hear  both 
the  distinguishing  tone  and  base  at  their  respective  physical  locations,  and  additionally  perceive  the  integration  of  these  stimuli  as 
a  centrally  localized  chord  However,  we  are  not  aware  of  any  empirical  evidence  which  suggests  that  TP  exists  for  any  stimuli. 
In  fact,  both  musical  and  speech  DP  subjects  generally  report  hearing  stimuli  at  only  two  locations.  As  a  secondary  issue,  the 
present  experiments  investigate  the  likelihood  of  TP  with  musical  stimuli. 

Gestalt  Notions  and  DP 

If  both  speech  and  nonspeech  DP  stimuli  are  processed  in  a  singular  manner,  then  speech  and  nonspcech  DP  need  not 
necessarily  reflect  the  operation  of  distinct  modules.  For  example,  according  to  GestaJt  terminology  (e.g.,  Wertheimer.  1958). 
syllables  and  chords  both  should  represent  "good*  (simple,  organized,  unified  perceptions  of  stimuli  consisting  of  several 
components  that  frequently  occur  together),  "strong"  (resisting  analysis  into  separate  components)  figures.  I'll  esc  good,  strong 
figures  are  thus  more  easily  perceived  with  less  information  (e.g..  stimulus  energy  or  impoverished  components)  than  is  required 
to  separately  perceive  components  critical  to  their  perception  (e.g..  transitions  or  chord-distinguishing  tones),  resulting  in  "closure" 
(see  below)  and  thus  the  apparent  precedence-taking  DP  findings. 

Bregman  (1987.  1990)  has  suggested  that  both  speech  and  nonspeech  DP  artse  when  there  is  sufficient  conflict  between 
cues  used  to  segregate  and  to  integrate  two  stimuli.  Differential  localization  or  quality  is  a  major  cue  for  segregating  base  and 
distinguishing  component  which  could  compete  against  integration  cues  which  reflect  either  Gestalt  principles  of  perceptual 
organization  (e.g..  good  continuation  and  frequency /temporal  proximity)  or  stimulus-specific,  schema-based  properties.  Fusion  to 
perceive  chords  or  syllables  then  would  presumably  result  from  (1)  the  synchronous  presentation  of  components.  (2)  the  end 
frequency  of  the  transition  corresponding  to  the  initial  steady  state  frequency  in  the  base  in  DPS.  and  (3)  the  tone  and  fifth 
maintaining  simple  integer  frequency  ratios  in  musical  DP.  Motivated  by  such  an  analysis,  Ciocca  and  Bregman  (1989)  have 
demonstrated  the  weakening  of  DP  for  speech  syllables  when  the  contralateral  distinguishing  component  (transition)  is  part  of  a 
coherent  stream  reflecting  the  principles  of  similarity  and  good  continuation  The  Gestalt  conceptualization  should  not  only 
encompass  both  speech  and  nonspeech  auditory  examples,  but  also  has  been  suggested  by  Bregman  as  generalizing  across 
modalities. 

Motivation  for  Current  Research 

DP  research  cannot  at  present  (and  probably  can  never)  provide  unequivocal  evidence  in  support  of  a  postulated 
phonetic  (and/or  musical)  module.  In  focusing  on  modularity,  previous  DP  research  has  often  overlooked  critical  perceptual 
issues  which  the  DP  paradigm  is  well  suited  to  address.  Perception  of  stimuli  used  in  studies  of  DP  can  provide  a  unique 
opportunity  to  evaluate  the  contribution  of  specific  variables  to  auditory  perceptual  organization,  by  (1)  revealing  the  conditions 
necessary  for  perceptual  integration,  (2)  evaluating  the  relative  saliency  of  organizational  cues,  and  (3)  specifying  the  nature  of 
attcntional  and  perceptual  limits  of  the  auditory  system. 

The  strength  of  general  perceptual  (e.g..  Gestalt)  explanations  of  processes  underlying  DP  can  be  evaluated  by 
revealing  the  various  conditions  necessary  for  frequent  fusion  with  both  speech  and  nonspeech  stimuli.  In  so  doing,  we  will  gain 
a  better  understanding  of  (1)  the  critical  cues  for  integration  and  segregation  of  stimuli,  and  (2)  how  these  cues  operate  in  the 
presence  of  other  consistent  or  conflicting  sources  of  information  for  grouping  stimuli.  After  the  separate  conditions  for  fusion 
in  speech  and  nonspeech  DP  are  established,  analogous  speech  and  nonspeech  conditions  might  be  investigated  in  a  more  realistic 
attempt  to  resolve  the  phonetic  modularity  issue  for  DP  research.  Then,  if  analogous  stimuli  always  provide  similar  patterns  of 
perception,  DP  might  reflect  the  operation  of  general  auditory  principles.  If.  however,  significantly  different  perceptual 
tendencies  are  obtained  for  analogous  speech  and  nonspeech  stimuli,  the  nature  of  distinct  modes  of  processing  could  begin  to  be 
established 

Some  stimulus  variables  critical  to  the  incidence  of  DPS  have  been  determined.  Characterizing  DP  as  a  form  of 
spectral/  temporal  fusion,  Cutting  (1976)  evaluated  the  effects  of  several  variables  on  fusion  of  speech  stimuli.  Fusion  in  DPS 
was  relatively  insensitive  to  changes  in  intensity  (also  see  Whalen  &  Liberman.  1987)  and  frequency,  but  diminished  with 
increasing  asynchrony  of  component  stimuli.  Similar  evaluations  of  possibly  critical  stimulus  variables  for  musical  l)P  have  Ixrcn 
lacking.1  Ihc  present  experiments  represent  an  initial  attempt  to  evaluate  stimulus  factors  which  may  be  critical  to  fusion  in 
musical  DP.  -“Base  complexity  (defined  as  the  number  of  invariant  base  components  shared  between  at  least  two  labelling 
categories)  was  selected  as  one  variable  which  could  affect  the  incidence  of  fusion  For  the  present  purposes,  increasing  musical 
base  complexity  will  lie  defined  as  adding  tones  of  different  chroma  to  the  original  C-G  (fifth)  base. 

Generally,  western  music  listeners  have  not  been  presented  the  base  in  isolation.  'Ibc  base  also  cannot  lie  resolved  as 
a  major  or  minor  chord,  which  represent  more  commonly  heard  chord  structures.  Therefore.  l>ccausc  of  the  unusual  nature  of 
the  base,  it  is  argued  that  the  bases  used  in  the  current  experiments  all  represent  open  forms.  Manipulating  the  complexity  of 
the  base  should  alter  the  figural  goodness  of  the  chord  resulting  from  the  fusion  of  distinguishing  tone  and  !>asc  (see  below) 

As  a  result,  wc  will  use  the  term  "base  complexity*  not  only  to  describe  our  dichotic  stimulus  manipulation.  Imt  also  to  refer  to 
alterations  in  stimulus  (chord)  complexity 

Base  complexity  represents  one  factor  which  long-standing  Gestalt  principles  of  perceptual  organization  would  predict  to 
influence  the  rate  of  perceptual  integration.  Ibc  Gestalt  principle  of  closure  is  defined  as  the  perceptual  tendency  to  complete 
(close)  physically  incomplete  (open)  forms,  resulting  in  the  perception  of  good  strong  figures  instead  of  poor,  weak  figures 
Component  stimuli  within  a  rlosed  form  are  perceived  as  belonging  to  j  more  stable  representation  than  if  separately  perceived 
( Ik >l -t -it t  1943)  Assuming  that  the  base  in  Dl*  is  a  relatively  open  form,  stimulus  properties  which  increase  the  pcxxlncss  and 
sirr-iv„'th  "I  a  (major  or  minor)  •  hord  could  lx*  regarded  .is  increasing  the  tendc  n<  \  toward  closure  based  tijxin  the  lusion  <>t  tone 


Perceptual  Organization 


3 


and  base.  In  other  words,  dichotic  con  figurations  which  are  more  readily  fused  and  thus  closed  (or  less  open)  could  l>c  assumed 
to  represent  better  articulated,  more  stable  forms.  Our  major  interest  in  determining  the  likelihood  of  any  given  perceptual 
organization  (e.g.,  fusion)  as  a  function  of  complexity  was  that  such  a  determination  may  reveal  the  nature  of  the  relationship 
between  stimulus  complexity  and  figurai  goodness  for  the  current  musical  stimuli. 

Does  increasing  base  complexity  of  musical  stimuli  result  in  more  open  or  more  closed  stimuli?  The  Gestalt  principles 
of  good  continuation  and  (both  frequency  and  temporal)  proximity  suggest  that  chords  with  many  tones  may  represent  stronger, 
more  figuraily  good  forms  than  chords  consisting  of  fewer  tones  (Wertheimer,  1958).  Given  the  perceptual  tendency  to  perceive 
good,  strong  figures,  distinguishing  tones  then  should  be  more  easily  integrated  with  bases  of  greater  complexity.  The  tendency 
to  fuse  a  tone  and  base  in  the  base  ear  then  should  increase  as  the  number  of  tones  in  the  base  increases. 

Alternatively,  because  the  frequency  ratios  between  component  tones  cease  to  be  in  simple  integer  ratios  (Dowling  and 
Harwood,  1986).  increasing  the  number  of  tones  decreases  the  overall  consonance  of  chords.  As  a  consequence,  increasing 
complexity  instead  may  decrease  figurai  goodness  for  base  stimuli  and  bases  fused  with  distinguishing  tones.  Thus,  fusion  of 
lone  and  base  may  become  less  likely  with  the  addition  of  more  tones  to  the  base.  Therefore,  we  cannot  make  explicit 
predictions  regarding  the  probability  of  fusion  with  varied  base  complexity,  but  rather  leave  this  issue  as  an  important  perceptual 
question  for  which  the  present  research  can  provide  an  empirical  answer.1 

EXPERIMENT  1:  Establishing  Effects  of  Base  Complexity 

Experiment  1  estimated  the  probability  of  many  distinct  perceptual  organizations  as  a  function  of  changes  in  stimulus 
(base)  complexity.  These  organizations  included  not  only  the  probability  of  fusion  and  TP,  but  also  perceptual  configurations 
common  to  other  (speech  and  visual)  stimuli. 

Method 

.Some  of  the  current  methods  were  motivated  by  DP  findings.  Although  used  in  previous  musical  DP  demonstrations 
(Pastore,  et  al.,  1983;  Collins,  1985).  major/minor  chord  labelling  performance  is  not  equally  good  across  subjects,  even  when  the 
subjects  are  musicians.  This  Jack  of  consistent  performance  across  subjects  complicates  the  understanding  of  whether  or  not 
fusion  is  in  evidence  for  some  subjects.  Most  people  without  extensive  musical  backgrounds,  however,  can  reliably  perceive 
differences  between  major  and  minor  chords,  even  if  some  of  these  Individuals  cannot  consistently  apply  appropriate  labels  to 
each  chord.  Since  ability  to  discriminate  chords  produced  solely  by  fusion  is  strong  evidence  for  fusion,  an  AX  procedure  was 
used. 

Subjects.  All  subjects  had  studied  at  least  one  musical  instrument  (although  not  always  an  instrument  capable  of 
producing  chords)  and  thus,  in  theory,  understood  the  distinction  between  major  and  minor  chords.  However,  because  we 
accurately  expected  that  musical  expertise  of  possible  subjects  would  vary  widely,  an  a  priori  performance  criterion  of  better  than 
chance  performance  for  binaural  chord  discrimination  was  adopted  in  both  experiments  to  insure  that  all  subjects  had  a  working 
understanding  of  the  perceptual  difference  between  major  and  minor  chords.  Since  failure  to  accurately  discriminate  binaural 
chords  made  evaluation  of  dichot/c  perceptual  organization  impossible,  we  discarded  the  data  from  any  subject  who  did  not  meet 
the  a  priori  criterion.  In  Experiment  1,11  SUNY-Binghamton  undergraduates  who  met  the  a  priori  performance  criterion 
participated  as  subjects  in  partial  fulfillment  of  course  requirements 1  The  experiment  lasted  approximately  45  minutes. 

Materials.  Stimuli  were  generated  from  digitally  sampled  piano  tones  (Yamaha  AWM  Sound  Expander  EMT-10), 
recorded  on  cassette  i ape.  and  digitized  (12-bit,  10  kHz  sample  rate)  for  on-line  computer  presentations  (with  4  kHz  antialiasing 
filter).  Ail  tones  were  of  equal  length  (1424  ms),  and  were  from  an  equi-tempered  interval  scale  with  the  following  tone  chroma 
and  frequency  in  Hr  C  (266).  K*  (316),  E  (33S).  G  (398).  A  (447),  B  (501).  and  D  (597).  Tones  then  were  digitally  mixed  to 
produce  the  various  base  complexes  and  chords.  All  stimuli  were  presented  over  TDH-49  headphones  at  75  dB  SPL  peak 
amplitude. 

E  and  Kb  tones  always  distinguished  chords  as  major  (e  g..  C-E-G)  or  minor  (C-Eb-G).  Chords  were  additionally 
distinguished  by  the  number  of  tones  (2,  3.  or  4)  constituting  the  base  in  dichotic  trials;  the  2-.  3-.  and  4-tone  bases  were 
C-G.  C-G-A.  and  C-G-B-D.  respectively. 

Procedure:.  Binaural  Discrimination.  Upon  consent,  subjects  ran  a  block  of  80  randomized  binaural  same-different 
(AX)  discrimination  Inals  intended  as  a  baseline  measure  of  subject  performance  for  subsequent  dichotic  trials  derived  from  the 
same  stimuli'.  Both  binaural  and  dichotic  trials  consisted  of  the  A  (standard)  and  X  (comparison)  stimuli  separated  by  a  1500  ms 
ISI.  and  ended  with  a  2  s  response  interval. 

Each  of  6  jvissiblc  chords  was  presented  as  the  A  stimulus  on  10  trials  The  chords  were:  ('-major  (C-E-G),  C-minor 
(C-Eb-G).  C-major  6th  (C-E-G-A).  C-minor  6th  (C-Eb-G-A).  (.-major  9th  (C-E-G-B-D).  and  (’-minor  9th  (C-Eb-G-B-D).  Each  A 
stimulus  (e.g..  a  major  or  minor  C  6th  chord)  was  paired  equally  often  with  itself  and  its  alternative  (minor  or  major  6th)  chord 
as  X  stimuli.  The  remaining  (20)  trials  were  designed  to  insure  that  subjects  also  could  reliably  distinguish  isolated  E  and  Eb 
tones,  presenting  each  tone  as  the  A  and  the  X  stimulus  with  an  independent  probability  of  0.5.  ITicsc  binaural  conditions  also 
provided  the  empirical  qualification  criterion  for  subjects  through  a  direct  quantification  of  "same "/"different"  response  tendencies 
for  each  condition  and  level  of  base  complexity.  'Hiese  results  later  will  Ik  used  to  correct  estimated  perceptual  probabilities  for 
response  tendencies 

Dichotic  Discrimination:  Stimuli  After  a  short  break,  subjects  began  a  block  of  24b  dichotic  trials.  Subjects  were  instructed  to 
tal>cl  each  AX  stimulus  pair  as  same  or  different  only  with  respect  to  the  car  in  which  bases  or  complete  chords  were  presented 
{ henceforth,  the  target  car)  and  to  ignore  the  information  presented  to  the  other  ear  (henceforth,  the  contralateral  car)  larget 
car  assignment  remained  constant  for  a  given  subject,  but  was  counterbalanced  across  subjects 

I  able  1  summarizes  the  duhottc  stimuli  along  with  possible  nonvendua!  jx-rreptual  organizations  (ic  .  perceptions 


Perceptual  Organization 


4 


differing  from  the  physical  configuration  of  tones)  for  each  stimulus  used.  In  summarizing  the  conditions  throughout  both  tables 
and  text,  only  2-tone  base  components  will  be  presented.  Stimuli  for  3-  and  4-tone  bases  can  be  obtained  by  respectively  adding 
A  and  B-D  tones  to  the  C-G  base.  Target  ear  (t.e.)  tones  will  always  be  displayed  to  the  left  of  double  vertical  lines,  with 
contralateral  ear  (c.e.)  tones  presented  to  the  right. 


Insert  Table  1 


The  nonveridical  percepts  critical  to  our  original  hypotheses  are  instances  of  fusion  (displayed  in  column  2  of  Table  1). 
Fusion  arises  when  subjects  integrate  a  contralateral  distinguishing  tone  with  target  car  information  to  perceive  a  single,  unified 
percept  in  the  target  ear.  For  demonstration  purposes,  fusions  are  displayed  as  instances  of  DP,  with  the  simultaneous 
perception  of  the  distinguishing  tone  as  a  separate  event  If  chords  with  many  tones  represent  better-articulated  figures,  then 
fusion  should  occur  more  frequently  as  base  complexity  increases  when  presented  the  base  and  a  distinguishing  tone  to 
contralateral  ears.  If  increasing  complexity  instead  decreases  figural  goodness,  then  fusion  of  tone  and  base  should  decrease  with 
increasing  complexity.  When  two  distinguishing  tones  are  presented  dichotically  (one  physically  mixed  with  the  base),  fusion 
should  result  in  the  perception  of  a  chord  (consisting  of  both  E  and  Efc  tones)  that  is  neither  major  nor  minor,  which  should  be 
an  unstable  (dissonant)  figure.  Thus,  such  fusion.,  should  not  occur  frequently.  Additionally,  if  increasing  complexity  increases 
figural  goodness,  fusion  of  contralateral  distinguishing  tones  should  decrease  as  a  function  of  increasing  complexity.  Conversely, 
if  increasing  complexity  decreases  figural  goodness,  fusion  of  distinguishing  tones  should  increase  as  a  function  of  increasing 
complexity. 

Another  plausible,  nonveridical  percept  (shown  in  column  3  of  Table  1)  is  "migration".  Migrations  were  proposed  to 
occur  only  when  subjects  were  contralateral ly  presented  different  chord-distinguishing  tones  (E  and  E*\  one  of  which  was  mixed 
with  the  base).  In  these  instances,  both  distinguishing  tones  would  be  perceived,  but  contralateral  to  their  physical  locations. 
While  intuitively  improbable,  migrations,  like  fusions,  maintain  the  unified  perception  of  a  figural  ly  strong  chord  in  the  target  ear 
and  a  contralateral  chord-distinguishing  tone.  Such  invalid  assignment  of  feature  locations  has  been  demonstrated  in  the  visual 
attention  literature  (e.g.,  Treisman,  1990),  and  should  be  likely  to  occur  in  audition;  we  will  later  describe  how  notions  from  the 
visual  domain  might  apply  to  musical  stimuli. 

Column  4  of  Table  1  depicts  another  nonveridical  percept  which  has  been  demonstrated  with  speech  stimuli.  Repp 
( 1978a,  b,  and  c)  identified  which  of  several  dichotic  pairs  of  CV  syllables  fused  perfectly,  giving  rise  to  the  perception  of  a 
single  syllable.  Labelling  of  these  stimuli  often  exhibited  patterns  of  stimulus  dominance,  a  ".  .  .  tendency  of  one  stimulus  in  a 
specific  dichotic  pair  to  receive  more  correct  responses  than  the  other  stimulus,  regardless  of  the  ear  in  which  it  occurs  (p.  133)." 
A  dominant  stimulus  element  therefore  contributes  to  perception  regardless  of  where  it  is  presented. 

Stimulus  dominance  could  occur  only  when  two  dichotic  distinguishing  tones  were  simultaneously  presented,  one  of 
which  was  physically  mixed  with  the  base  (e  g..  C-E-G  ||  E*).  Dominance  would  result  in  the  perception  of  the  isolated 
distinguishing  tone  in  both  ears,  preventing  perception  of  the  other  distinguishing  tone  (i.e.,  C-E*-G  fl  Ek).  While  responses  for 
any  singular  condition  cannot  distinguish  between  migration  or  stimulus  dominance,  comparisons  across  conditions  (addressed  in 
the  general  discussion)  will  allow  an  evaluation  of  the  likelihood  of  both  percepts. 

Possible  triplex  percepts  are  shown  in  column  5  of  Table  1.  Due  to  a  lack  of  any  evidence  for  TP,  TP  was 
hypothesized  not  to  occur.  If  TP  occurs  in  a  manner  consistent  wil*i  the  postulations  of  Liberman  and  colleagues  (e  g.,  Mattingly 
and  Liberman.  1989),  subject  responses  should  be  equivalent  to  those  given  veridical  target  ear  perception  If.  on  the  other 
hand,  subjects  ignore  the  instructions  to  respond  on  the  basis  of  target  ear  perception,  and,  instead,  are  "distracted"  to  respond  to 
the  triplex  percept  at  a  central,  abstract  position,  then  responses  based  on  TP  would  be  indistinguishable  from  instances  of 
fusion.  Therefore,  the  incidence  of  TP  cannot  exceed  the  joint  probability  of  veridical  perception  and  distraction.  Although  it  is 
impossible  to  directly  evaluate  distraction  to  a  postulated  triplex  percept,  we  did  test  for  "distraction"  to  respond  to  contralateral 
ear  information  in  Experiment  2.  If  such  distractions  are  rare,  then  responses  reflecting  an  integration  of  tone  and  target  ear 
information  (base  or  chord)  would  be  more  consistent  with  fusion  than  IT. 

Dichotic  Conditions.  All  conditions  were  generated  using  the  stimuli  summarized  in  column  1  of  Table  1,  but  with 
varying  levels  of  base  complexity,  llie  various  conditions  were  designed  to  (1)  assess  subject  tendencies  to  fuse  both,  one,  or 
neither  (A.  X)  stimuli  as  a  function  of  component  arrangements  and  levels  of  complexity,  and  (2)  to  minimize  possible  overall 
response  biases.  In  each  of  four  conditions  60  randomly  mixed  trials  were  presented,  with  20  tnals  for  each  level  of  Ixase 
complexity.  Ten  trials  presented  one  configuration  of  distinguishing  tone(s)  and  base;  the  other  10  trials  substituted  T.  for  E.k 
and  vice  versa 

A  listing  of  dichotic  conditions  for  both  experiments,  including  responses  expected  for  each  possible  perceptual 
organization,  is  provided  in  Table  2.  Only  one  configuration  of  distinguishing  tones  is  listed;  the  alternative  configuration  can  l>c 
obtained  by  substituting  F.  for  K1  and  vice  versa.  Conditions  arc  discussed  in  the  order  in  which  they  appear  in  the  tabic  so 
that  the  table  may  tic  used  as  a  reference  throughout  description  of  conditions  and  response  predictions. 

Insert  Table  2 

l  arh  condition  was  designed  to  evaluate  the  perceptual  organization  indicated  by  the  condition  lat»cl  For  example,  the 
I  I  SI  -III  HI  K  Condition  is  an  exclusive  (X()R)  condition  for  fusion,  where  "different"  responses  will  rcsuli  from  fusion  of 
either  the  A  nr  X  stimulus,  but  not  both.  All  other  percepts  would  result  in  "same"  responses  Since  the  physical  configuration 
of  i h<  A  and  X  st  "mil  were  identical  in  this  condition,  "same"  responses  were  hypothesized  to  predominate 


Perceptual  Organization 


5 


Itie  FUSI  NEUHER  Condition  was  generated  by  substituting  the  alternative  distinguishing  tone  in  the  X  stimulus  of 
the  1  USI.  1.11  Hl\R  Condition  Only  by  appropriately  perceiving  target  ear  information  in  l>oih  stimuli  would  subjects  produce 
"same*  responses;  fusing  either  (or  both)  distinguishing  lone(s)  would  instead  result  in  noncquivalcnt  percepts.  Given  the 
demonstrated  tendency  to  fuse  these  musical  stimuli  (e.g,,  Pastore,  et  al.,  1983),  high  ratrs  of  "different'*  responses  were 
hypothesized.  If  chords  consisting  of  many  tones  represent  belter-articulated  figures,  we  would  predict  an  increased  frequency  of 
"different"  responses  with  increasing  base  complexity.  Conversely,  if  chords  with  many  tones  represent  poorly  articulated  figures, 
"different"  responses  should  decrease  with  increasing  base  complexity. 

The  remaining  conditions  involved  dichotic  presentation  of  both  distinguishing  tones  (one  physically  mixed  with  the 
base),  and  thus  allowed  evaluation  of  migration  or  dominance.  The  FUSE-1  Condition  was  designed  to  determine  the 
probability  of  fusing  only  the  X  stimulus.  "Same"  responses  only  would  be  obtained  when  subjects  appropriately  perceived  target 
car  information  in  the  A  stimulus  and  fused  the  X  stimulus  (both  resulting  in  the  target  ear  perception  of  C-E-G  in  the 
example).  Due  to  the  expected  high  rate  of  fusion  of  the  X  stimulus  (similar  to  the  FUSE-NEITHER  stimuli  above), 
predominantly  "same"  responses  were  hypothesized.  Because  fusion  of  contralateral  distinguishing  tones  (or  TP)  would  result  in 
the  perception  of  a  dissonant,  unstable  form,  and  migration  would  require  mislocating  two  component  tones,  a  low  probability  of 
altered  target  ear  perception  was  expected  for  the  A  stimulus.  Furthermore,  if  increasing  complexity  increases  figural  goodness 
"same"  responses  should  increase  as  a  function  of  increasing  complexity  (and,  conversely,  if  increasing  complexity  decreases 
goodness  "same"  responses  should  decrease). 

Finally,  the  FUSE-BOTH  Condition  could  reflect  the  likelihood  of  fusing  contralateral  distinguishing  tones  of  both 
stimuli  to  perceive  figurally  poor,  weak  chords  (e.g.,  C-Ek-E-G),  which  would  result  in  "same"  responses.  As  noted  for  the 
FUSE-1  Condition,  neither  fusion  nor  migration  of  these  stimuli  was  expected.  However,  migration  or  dominance  of 
distinguishing  tones  for  either  the  A  or  X  stimulus  also  would  result  in  "same"  responses.  Other  perceptions  would  result  in 
"different"  responses,  including  the  hypothesized  veridical  perception. 

Results  and  Discussion 

Binaural  (Baseline)  Discrimination.  Mean  accuracy  in  terms  of  percent  correct  (with  standard  errors)  on  binaural 
discrimination  trials  is  shown  in  the  top  panel  of  Table  3.  All  subjects  discriminated  isolated  tones  (E  and  E1)  with  perfect 
accuracy,  and  discriminated  chords  at  high  levels  of  accuracy.  If  we  assume  that  dichotic  fusion  provides  the  same  underlying 
basis  for  perception  as  presentation  of  physically  mixed  stimuli,  then  mean  accuracy  rates  for  binaural  discrimination  should 
provide  a  baseline  of  chord  discrimmability  for  evaluating  dichotic  results.4 

Chord  discrimination  results  were  analyzed  in  a  2  X  3  ANOVA,  with  chord  (same  and  different  trials)  and  complexity 
(number  of  tones)  as  the  respective  factors.  The  only  significant  effect  was  the  chord  X  complexity  interaction  (F1220J  =  5.344, 
p  =  .Ql38).  The  nature  of  the  interaction  can  be  seen  in  the  table  of  means.  Accuracy  slightly  increased  with  increasing 
complexity  for  same  chord  trials  (revealed  by  a  nonsignificant  simple  main  effect,  F|220)=  1.477,  p  =  252).  However,  accuracy 
significantly  decreased  on  different  chord  trials  with  increasing  complexity  (simple  effect,  F(220)  =  4.045.  p  =  .033).  This 
interaction  suggests  that  with  increasing  chord  complexity,  there  may  be  an  increase  in  overall  perceived  similarity  between  major 
and  minor  chords,  with  the  processing  of  chord  distinguishing  tones  becoming  more  difficult.  Later,  these  binaural  results  were 
later  used  to  adjust  mean  performance  on  dichotic  trials  (below)  to  assess  the  probability  of  fusion  in  each  condition. 

Dichotic  Discrimination:  Overall.  Mean  percentage  of  "same"  responses  for  each  dichotic  condition  and  level  of  base 
complexity  are  shown  in  the  lower  panel  of  Table  3.  An  initial  4X3  ANOVA  was  conducted  with  dichotic  condition  and  base 
complexity  as  respective  factors.  As  expected,  all  effects  were  significant,  including  the  main  effects  or  base  complexity 
(F(220]  =  23.269.  p<.000l)  and  condition  (F| 3 JO |  =  26.144,  p<.0001).  plus  their  interaction  (F16.60]  =  7.96,  pc. 0001).  Effects 
within  conditions  are  included  below  in  the  context  of  individually  evaluating  the  perception  of  contralateral  tone  and  ba*e,  as 
well  as  contralateral  tone  and  chord. 


Insert  Table  3 


Because  the  combination  of  perceptual  organizations  leading  to  a  response  were  different  across  conditions,  it  was 
possible  to  solve  for  the  probability  of  particular  organizations  using  a  set  of  simultaneous  probability  equations  based  on  data 
across  conditions.  I'herefore.  where  applicable,  the  dichotic  means  from  Table  3  also  were  submitted  to  a  senes  of  prol>abihty 
formulae  These  formulae  also  used  the  binaural  chord  discrimination  results  to  adjust  for  the  tendency  to  perceive  chords  more 
similarly  with  increasing  complexity. 

I"he  formulae  were  based  on  a  few  reasonable  simplifying  assumptions,  the  validity  of  which  was  verified  for  similar 
stimuli  in  an  earlier  musical  DP  study  (Hall  and  Pastore.  1992).  First,  incidence  of  a  given  type  of  perceptual  organization  was 
assumed  not  to  vary  for  isolated  I*,  and  l-.*  distinguishing  tones.  Thus,  results  were  collapsed  with  respect  to  distinguishing  tones 
within  each  (stimulus  and)  condition.  Second,  since  there  is  no  basis  for  comparison  at  the  time  the  A  stimulus  is  presented, 
perception  of  the  A  stimulus  should  not  depend  on  the  nature  of  the  X  stimulus.  Dins,  incidence  of  a  given  type  of  perceptual 
organization  was  assumed  to  »>c  equivalent  across  conditions  for  A  stimuli  of  similar  structure.  Finally,  for  conditions  based 
vilely  upon  stimuli  containing  a  contralateral  distinguishing  tone  and  base  (i.c ..  FUSE-EIT 'HER  and  FUSE-NEITHER 
Conditions),  equal,  independent  probabilities  of  fusion  were  assumed  for  A  and  X  stimuli.  The  results  of  probability  formulae 
for  fv»ih  experiments  are  shown  in  I  able  4.  calculated  individually  for  each  type  of  stimulus  and  each  level  of  base  complexity 
Derivations  of  formulae  are  found  in  the  Appendix. 


Perceptual  Organization 


6 


Insert  Table  4 


Perception  of  Contralateral  Tone  and  Base.  Simple  effects  of  base  complexity  were  not  significant  for  the  FUSE- 
EITHER  and  FUSE-NEITHER  Conditions  (F-.018,  p<  900,  and  F  =  2.144,  p<.143.  respectively).  As  a  reminder,  the  high 
incidence  of  "same"  responses  op  FUSE-EITHER  trials  could  reflect  either  fusion  or  veridical  perception  of  both  A  and  X 
stimuli,  or  both.  The  moderate  rate  of  "different"  responses  on  FUSE-NEITHER  trials  was  consistent  with  the  fusion  of  tone 
and  base  in  one  or  both  stimuli. 

Means  from  both  conditions  were  submitted  to  probability  formulae  to  estimate  the  incidence  of  fusion  and  veridical 
perception  of  contralateral  tone  and  base.  The  probability  of  fusing  both  stimuli  in  the  FUSE-EITHER  and  FUSE-NEITHER 
Conditions  (calculated  using  Appendix  Eq.  3’),  displayed  in  the  first  panel  of  Table  4,  was  moderately  high,  with  a  slight 
decrease  from  2-  to  3-tone  levels  of  base  complexity.  Assuming  equal,  independent  probabilities  for  fusing  A  and  X  stimuli,  wc 
obtain  tne  probability  of  fusing  either  stimulus  (see  Table  4),  which  occurred  at  a  substantial  rate  (p  =  0.65  to  0.75). 

Perception  of  Contralateral  Tone  and  Chord.  Significant  simple  effects  of  base  complexity  were  obtained  for  the 
FUSE-1  and  FUSE-BOTH  Conditions  (F(220J= 12.891,  pc.0001,  and  F=22.044,  p<.000l,  respectively),  such  that  "same" 
responses  increased  with  increasing  base  complexity.  In  the  FUSE-1  Condition  the  complexity  effect  was  consistent  with  an 
increased  tendency  to  veridically  perceive  the  A  stimulus  base  ear  information  while  fusing  tone  and  base  in  the  X  stimulus. 

The  FUSE-BOTH  Condition  effect  was  consistent  with  an  increased  tendency  to  either  fuse  both  A  and  X  stimuli,  or  veridically 
perceive  one  stimulus  with  migraUon/dominancc  of  the  other  stimulus. 

The  minimum  rate  of  veridical  target  ear  perception  for  the  A  stimulus  (a  contralateral  tone  and  chord)  in  the  FUSE- 
1  and  FUSE-BOTH  Conditions  was  estimated  (using  Appendix  Eq.  4)  to  increase  as  a  function  of  increasing  base  complexity 
(top  of  second  panel.  Table  4).  This  also  represents  a  minimum  estimate  for  fusing  contralateral  tone  and  base  in  the  X 
stimulus  of  the  FUSE-1  Condition,  which,  as  expected,  is  below  the  estimated  rate  of  fusion  for  similar  stimuli  in  the  FUSE- 
FTTHER  and  FUSE-NEITHER  Conditions.  Rather  than  make  additional,  potentially  invalid  assumptions  to  allow  the 
estimation  of  a  range  of  probabilities  of  potential  fusion  and  migration/dominancc  of  contralateral  distinguishing  tones  in  the 
FUSE-1  and  FUSF'-BOTH  Conditions,  conditions  designed  to  reflect  the  operation  of  a  unique  perceptual  organization  were 
included  in  Experiment  2  to  evaluate  these  probabilities.  Clearly,  however,  the  results  of  both  FUSE-1  and  FUSE-BOTH 
Conditions  indicate  the  frequent  nonveridical  perception  of  contralateral  distinguishing  tones,  the  nature  of  which  will  be  explored 
in  greater  detail  in  Experiment  2. 

Conclusions.  A  few  conclusions  can  be  drawn  from  the  results  of  Experiment  1.  First,  subjects  frequently  fused  bases 
with  a  contralateral  distinguishing  tone.  Due  to  the  use  of  binaural  chord  discrimination  results  to  correct  for  subject  guessing. 
Fxperiment  1  is  argued  to  provide  a  reliable  quantification  of  the  rate  of  fusion  for  musical  DP  stimuli.  In  addition,  if  TP 
rather  than  fusion  had  given  rise  to  the  "different"  responses  obtained  in  the  FUSE-NEITHER  Condition,  subjects  must  have 
ignored  location  in  making  their  responses  (i.e.,  must  have  responded  to  a  chord  percept  at  an  abstract  position  between  their 
ears).  Therefore.  .  „  we  cannot  eliminate  this  possibility,  TP  seems  to  be,  at  best,  very  unlikely.  However,  the  tendency  to 
respond  based  upon  an  inappropriate  location  will  be  evaluated  in  Experiment  2. 

The  simple  effects  of  base  complexity  in  the  FUSE-1  Condition  could  be  attributable  to  (1)  an  increasing  tendency  to 
appropriately  perceive  target  car  information  in  the  A  stimulus  with  increasing  target  ear  complexity,  and  (2)  a  similarly 
increasing  tendency  to  fuse  distinguishing  tone  and  base  of  the  X  stimulus  in  the  target  ear.  "Altered  target  ear  perception"  in 
the  form  of  either  migration,  dominance,  or  fusion  of  contralateral  distinguishing  tones  also  was  demonstrated  in  the  FUSE- 
BOTH  Condition  to  change  as  a  function  of  base  complexity.  Although  migration  and  dominance  were  not  expected  perceptual 
conditions  and  thus  were  not  directly  evaluated  in  our  initial  focus  on  fusion,  altered  target  ear  perception  in  the  FUSE-BOTH 


Condition  frequently  might  have  been  due  to  either  of  these  perceptual  tendencies,  and  thus  warranted  further  investigation  in 


Experiment  2  These  instances  of  altered  target  ear  perception  suggest  parallels  with  visual  attention  research  which  has 


investigated  how  people  attend  to  and  analyze  stimulus  features  under  varying  degrees  of  stimulus  complexity.  'I'hcsc  parallels 


will  tie  discussed  in  the  general  discussion  section. 


EXPERIMENT  2:  Investigating  Mis  localization  of  Component  Tones 
Experiment  2  further  investigated  the  unexpected  effects  of  base  complexity  from  the  FUSE-1  and  FUSE-BOTH 
Conditions  of  Experiment  1.  No  explicit  empirical  test  existed  in  Experiment  1  to  evaluate  possible  differences  in  altered  target 
car  perception  depending  on  stimulus  position  (A  or  X)  within  a  trial.  The  results  from  the  FUSE-1  Condition  suggest  that 
target  car  perception  of  the  X  stimulus  was  more  readily  modified  to  match  veridical  perception  of  the  A  stimulus.  Thus,  the  A 
stimulus  may  often  serve  as  a  perceptual  template  which  alters  schema-driven  aspects  of  the  perception  of  the  X  stimulus. 
Experiment  2.  therefore,  included  conditions  designed  to  separately  assess  the  likelihood  of  a  given  perceptual  organization 
(fusion,  migration,  or  dominance)  for  A  and  X  stimuli. 

Experiment  I  also  did  not  determine  whether  subjects  fused  contralateral  distinguishing  tones  to  perceive  figurally  Ivad. 
dissonant  chords;  such  perception  only  was  originally  assumed  on  theoretical  grounds  to  be  highly  unlikely.  In  Experiment  2, 
additional  figurally  poor,  dissonant  binaural  and  dichotic  conditions  were  included  in  which  both  \  and  1 "  distinguishing  tones 
were  presented  ipsilatcrally  Dichotic  conditions  involving  one  such  stimulus  enabled  estimation  of  the  probability  of  fusion  for 
alternative  stimuli  involving  dichotic  distinguishing  tones  (one  of  which  was  physically  mixed  with  the  base)  Similarly,  conditions 
were  designed  to  determine  the  probability  of  stimulus  migration  or  dominance  by  pairing  stimuli  containing  contralateral  tone 
and  rhurd  with  .!  monaural  major/ minor  chord  f  inally,  to  Ik* t ter  address  the  nnlikclv  possibility  ||'  from  I  xjvnmcnl  ). 


I 


I 


Perceptual  Organization 


7 


conditions  also  were  included  to  evaluate  whether  subjects  were  distracted  in  Experiment  1  to  respond  to  an  inappropriate 

location 

Method 

Subjects.  Musical  history  and  performance  criteria  were  the  same  as  those  of  Experiment  1.*  The  subjects  were  five 
SUNY-Binghamton  undergraduate  introductory  psychology  students  who  were  participating  in  partial  fulfillment  of  course 
requirements.  The  first  author  served  as  a  sixth  subject. 

Materials.  In  addition  to  the  distinguishing  tones,  bases,  and  chord  stimuli  of  Experiment  1.  dissonant  chords  were 
generated  by  physically  mixing  both  distinguishing  tones  (E  and  E*)  with  bases  of  varying  complexity  (C-G,  C-G-A.  or  C-G-B- 

D). 

Procedure:  Binaural  Discrimination.  In  addition  to  the  80  AX  binaural  discrimination  trials  from  Experiment  1,  45 
randomly  distributed  dissonant  chords  trials  were  included  to  evaluate  subject  ability  to  discriminate  major  or  minor  chords  from 
chords  which  included  both  distinguishing  tones.  Thirty  of  these  trials  consisted  of  1  of  the  6  major/minor  chords  from 
Experiment  I  (e.g..  C-E-G-A,  a  C-major  6th  chord)  as  the  A  stimulus,  followed  by  a  dissonant  chord  of  similar  complexity  (C- 
E*-E-G-A)  as  the  X  stimulus.  Five  repetitions  of  each  AX  combination  were  included.  In  the  remaining  15  trials  each  of  the  3 
dissonant  chords  served  as  both  A  and  X  stimuli  (5  repetitions  each). 

Dichotic  Discrimination:  Stimuli.  Upon  completion  of  the  binaural  discrimination  trials,  and  after  a  short  break, 
subjects  performed  a  block  of  400  randomized  dichotic  discrimination  trials.  Target  ear  assignment  was  again  counterbalanced 
across  subjects.  Most  stimuli  were  those  of  Experiment  1,  constructed  from  dichotic  combinations  of  one  isolated  distinguishing 
tone  with  either  a  base,  or  a  chord  determined  to  be  major  or  minor  by  the  inclusion  of  the  alternative  distinguishing  tone. 

Bases  and  chords  were  of  varied  complexity.  The  remaining  stimuli  were  monaural  versions  of  the  dissonant  chords  (of  equally 
varying  complexity)  used  in  binaural  discrimination;  therefore,  dissonant  chords  without  simultaneous  dichotic  tones. 

Possible  perceptual  organizations  for  each  of  these  individual  trial  stimuli  are  again  listed  in  Table  1.  Each  dichotic 
condition  was  designed  with  the  expectation  that  a  particular  perceptual  organization  would  result  in  a  unique  pattern  of 
responses  across  conditions.  In  this  way.  dichotic  conditions  could  be  used  to  evaluate  the  tendency  toward  particular 
organizations.  Only  perceptual  configurations  of  interest  and  corresponding  responses  will  be  discussed  in  the  text.  A  complete 
listing  of  expected  responses  for  ail  possible  perceptual  organizations  and  conditions  (displayed  for  2-tonc  bases  and  one 
configuration  of  distinguishing  tones)  is  additionally  provided  in  Table  2.  Derivations  of  these  responses  can  be  obtained  using 
Table  1  and  the  following  description  of  dichotic  conditions. 

Dichotic  Conditions  Sixty  AX  trials  were  randomly  presented  for  each  of  6  experimental  dichotic  conditions.  Within 
each  experimental  condition,  20  trials  were  presented  at  each  level  of  complexity  Within  each  level  of  complexity,  10  trials 
employed  one  configuration  of  distinguishing  tones  for  A  and  X  stimuli  (e.g.,  C-F.-G  |[  Efc);  the  other  10  trials  used  the  opposite 
configuration  (i.e.,  C-E*-G  ||  E).  Two  additional  control  conditions  of  20  trials  each  were  included  to  evaluate  response  bias;  in 
these  trials  there  again  was  an  equal  probability  (p  =  0.5)  for  either  configuration  of  distinguishing  tones.  To  limit  the  number 
of  dichotic  trials,  control  conditions  only  involved  the  simplest  level  of  target  ear  complexity.  We  now  review  predictions  for 
conditions  in  the  order  of  which  they  appear  in  Table  2. 

The  MIGRATE-1ST  and  MIGRATE-2ND  Conditions  evaluated  the  likelihood  of  subjects  migrating  contralateral 
distinguishing  tones  (or,  alternatively,  exhibited  stimulus  dominance  effects)  in  the  A  or  X  stimulus,  respectively.  Since  the  X 
stimulus  in  the  MIG!' M'E-l^T  Condition  and  the  A  stimulus  in  the  MIGRATE-2ND  Condition  both  represent  good  figures 
(major  or  minor  chords)  with  no  tones  presented  to  the  contralateral  ear,  they  were  assumed  to  be  perceived  vertically.  “Same" 
responses  then  would  be  obtained  if  subjects  migrated  contralateral  distinguishing  tones  in  the  remaining  stimulus.  If  altered 
target  ear  perception  by  fusion  or  migration  of  distinguishing  tones  occurs  more  frequently  for  X  stimuli  in  order  to  match 
veridical  target  ear  perception  of  the  A  stimuli,  then  "same"  responses  should  occur  more  frequently  in  the  MIGRATE-2ND 
Condition  (and  should  increase  with  increasing  base  complexity  for  this  condition).  On  the  other  hand,  "same"  responses  in  both 
conditions  should  similarly  increase  with  increasing  complexity  if  incidence  of  tone  migration  does  not  depend  on  stimulus 
position  or  order 

The  D1S  J’RACT-NO.FUSE  and  D1STRACT-FUSE.2ND  Conditions  evaluated  the  remote  possibility  that  subjects  in 
Experiment  1  were  distracted,  responding  inappropriately  to  perception  of  tones  they  received  other  than  in  the  target  car  (e.g., 
responding  ta  the  contralateral  ear).  Distraction  was  evaluated  by  presenting  bases  or  chords  of  AX  stimuli  to  the  contralateral 
car  rather  than  to  the  target  ear  Both  the  DISTRACT-NO  FUSE  and  the  DISTRACT-FUSI  2ND  Conditions  were  generated 
by  reversing  the  ears  to  which  tones  were  presented  in  other  dichotr  conditions  (Experiment  I  s  FUSE-NEH  HER  Condition 
and  Experiment  2's  DISSONANT-FUSH.2ND  Condition  described  below,  respectively)  "Same"  responses  in  the  DISTRACT- 
NO.FUSE  Condition  presumably  would  result  only  if  subjects  attended  to  and  vendically  perceived  the  tone  complex  presented 
to  the  contralateral  ear.  "Same"  responses  in  the  DISTRACT-FUSE.2NI)  Condition  were  thought  to  primarily  reflect  responding 
to  the  contralateral  ear  and  fusion  of  distinguishing  tone  and  chord  in  the  X  stimulus.  In  this  condition,  "same"  responses  also 
could  theoretically  reflect  (1)  migration  or  dominance  of  the  isolated  distinguishing  tone  in  the  A  stimulus,  or  (2)  TP  of  the  X 
stimulus  However,  tone  migration  in  the  A  stimulus  seemed  unlikely  since  it  was  more  likely  that  subjects  were  aware  that  no 
tones  were  presented  to  the  target  car.  TP  was  also  unlikely  for  reasons  already  discussed  in  Experiment  1. 

I’hc  final  two  experimental  conditions,  the  DISSONAN  T-FUSE. 1ST  and  DlSSONAN  I -FUSE.2ND  Conditions, 
evaluated  the  likclihoxl  of  subjects  fusing  contralateral  distinguishing  tones  to  perceive  dissonant  chords  Itic  evaluation  was 
conducted  individually  for  A  and  X  stimuli  by  reversing  the  order  of  stimuli  across  the  two  conditions  Assuming  veridical 
perception  of  the  other  trial  stimulus,  "same"  responses  would  lx-  obtained  if  subjects  fused  the  contralateral  distinguishing  tone 
and  f hord  (or.  less  likely,  exhibited  IP)  of  (I)  the  A  stimulus  in  the  DISSON.-W  I -I  1  S|  |SI  (  ondition.  and  (2)  the  X  stimulus 


I 


I 


» 


> 


» 


» 


5 


Perceptual  Organization 


8 


in  the  DISSONANT-Fl.JSF.2Nl>  Condition  Based  upon  the  previously  cited  Gestalt  literature  on  figural  goodness,  "same" 
responses  were  not  expected  to  occur  frequently.  As  previously  noted,  increases  in  "same"  responses  as  a  function  of  increasing 
complexity  would  be  consistent  with  decreasing  figural  goodness  as  base  complexity  increases;  the  reverse  pattern  should  be 
observed  if  figural  goodness  increases  with  increasing  complexity. 

In  the  above  experimental  conditions,  almost  all  (likely)  perceptual  organizations  would  result  in  "different"  responses 
by  subjects,  and  those  organizations  which  would  be  reflected  by  "same"  responses  were  not  hypothesized  to  occur  frequently. 
The  final  two  dichotic  conditions,  therefore,  were  designed  to  consistently  result  in  "same"  responses,  thus  lowering  the  likelihood 
of  consistent  "different"  responding,  and  acting  as  control  conditions  to  determine  if  subjects  were  exhibiting  a  response  bias. 

The  DISTRACT-CONTROL  Condition  was  generated  from  the  FUSE-F.ITUF.R  Condition  stimuli  of  Experiment  1.  but 
presented  bases  to  the  contralateral  ear  and  distinguishing  tones  to  the  target  ear.  Subjects  presumably  would  not  respond 
“different*,  since  such  responses  would  reflect  not  only  distraction  to  the  contralateral  ear,  but  also  fusion  of  either  the  A  or  the 
X  stimulus  (not  both). 

In  the  NO.FUSE-CONTROL  Condition,  the  same  chord  was  presented  to  the  target  ear  for  both  A  and  X  stimuli, 
along  with  a  dissonant  contralateral  distinguishing  tone  in  the  A  stimulus.  "Different"  responses  only  would  be  obtained  if 
subjects  (1)  were  distracted  to  the  contralateral  ear.  or  (2)  exhibited  fusion  or  TP  of  the  A  stimulus,  neither  of  which  was 
hypothesized  to  occur. 

Results  and  Discussion 

Binaural  Discrimination.  Mean  subject  accuracy  (in  percent  correct)  is  shown  for  each  binaural  condition  and  level  of 
complexity  in  the  upper  panel  of  Table  5.  Subjects  again  discriminated  isolated  tones  with  perfect  accuracy  and  maintained  high 
accuracy  levels  of  chord  discrimination. 

Chord  discrimination  results  initially  were  analyzed  in  a  4  X  3  ANOVA,  with  condition  (same  and  different  trials  for 
both  major/mir.or  and  dissonant  chord  discrimination)  and  complexity  (number  of  tones)  as  the  respective  factors.  The  main 
effect  of  condition  was  not  significant  (FJ3.15j  =  .514.  p  =  .6790).  There  was  a  significant  main  effect  of  complexity 
(F[2,10 j=7.398.  p  =  .0107).  as  well  as  a  marginal  condition  X  complexity  interaction  (F|630J  =  2.141,  p  =  .0776). 

The  table  of  means  reveals  that  both  the  complexity  main  effect  and  the  marginally  significant  interaction  can  be 
attributed  primarily  to  decreasing  accuracy  with  increasing  complexity  on  different  chord  trials.  This  trend  was  revealed  by  an 
analysis  of  simple  main  effects  of  complexity  for  each  condition,  which  were  significant,  or  approached  significance,  for  different 
chord  trials  (FJ2, 10)  =3.879,  p  =  .057  for  major/minor  chord  discrimination;  FJ2.10J  =  5.309,  p  =  ,027  for  dissonant  chord 
discrimination),  but  failed  to  approach  significance  for  same  chord  trials  (F|2,10|  =  1.404,  p  =  .290  for  major/minor  chord 
discrtminaiion;  F|2,10)=  1.400,  p  =  .291  for  dissonant  chord  discrimination). 

These  results  are  consistent  with  the  binaural  findings  of  Experiment  1  in  indicating  an  increased  difficulty  in 
perceptually  isolating  individual  tones  as  the  number  of  tones  present  in  a  complex  stimulus  increases.  Binaural  chord 
discrimination  performance  again  was  used  to  provide  error  rates  for  dichotic  discrimination  performance,  enabling  what,  in 
theory,  should  be  an  accurate  evaluation  of  dichotic  perceptual  organization. 

Dichotic  Discrimination.  Mean  percentage  of  "same"  responses  for  each  dichotic  condition  and  level  of  complexity  is 
shown  in  the  lower  panel  of  Table  5.  A  6  X  3  ANOVA  with  experimental  dichotic  condition  and  complexity  as  respective 
factors  was  first  conducted.  All  effects  were  significant;  the  main  effect  of  complexity  (F|2.10)  =  13.364.  p  =  .0015),  the  main 
effect  of  condition  (F|.N25|  =  4o.7lK).  pc.OOOl).  and  their  interaction  (F|  10.50) -6.671,  pc.OOOl). 

Insert  Table  5 

The  main  effect  of  hasc  complexity  and  the  interaction  can  be  attributed  to  the  conditions  evaluating  migration, 
dominance  or  fusion  of  contralateral  distinguishing  tones,  reflected  by  significant  simple  main  effects  of  complexity  for  the 
MICiRATE-lST  (F[2.10|  =  13.527,  pc. 001),  MIGRATF.-2ND  (F|2.10}  =  1 1  740.  p>.002).  and  D1SSONANT-FUSE.1ST  Conditions 
(F[2.10J  =  4  137.  p  =  .049).  Tie  simple  main  effect  of  complexity  failed  to  reach  significance  for  the  DISSONANT-FUSE. 2ND 
Condition  (FJ2.10J  =  2.411  =.139).  The  simple  effects  reveal  significant  overall  increases  in  percent  "same"  responses  with 

increasing  complexity  («'  iccreases  in  "same"  responses  with  increasing  complexity  did  not  reach  significance  by  Tukcy-tesis). 

As  £xpecte'J  lects  almost  exclusively  responded  "same"  on  DISTRACT-CONTROL  trials,  where  all  target  ear 

components  for  A  and  timuli  were  identical.  ITicreforc.  subjects  seemingly  were  not  responding  "different"  on  the  basis  of  a 

simple  response  bias,  m  were  they  responding  to  percepts  at  inappropriate  locations,  in  the  experimental  dichotic  conditions. 

I>ie  unexpectedly  frequent  "different"  responses  on  NO.FUSE-CONTROL  trials  are  attributed  to  a  tendency  to  fuse 
contralateral  distinguishing  tones  (sec  below). 

As  expected,  the  simple  main  effects  of  base  complexity  did  not  approach  significance  in  the  D1STRACT-NO.FUSE 
and  DISTRACT-FUSE.2ND  Conditions  (F|2,10|  =  .30I.  p  =  .747,  and  F|2.10j  =  1.189.  p=.344,  respectively).  Ilicrcforc.  response 
(distraction)  to  the  contralateral  ear  seldom  occurred  and  was  not  made  more  likely  by  increasing  complexity.  Incidence  of 
distraction  across  ail  levels  of  complexity,  furthermore,  always  was  within  the  mean  error  rates  for  subjects  on  binaural 
discrimination  trials,  indicating  that  subjects  were  not  distracted  to  respond  on  the  basis  of  contralateral  or  mislocahzed 
information  ITiese  results  arc  important  l>crausc  the  incidence  of  TP  should  not  exceed  the  rate  of  responding  to  mislocah/cd 
information 

lukcy  comparisons  for  the  \IKiKA  M  -INI  and  MK iKA  I  I  -2ND  Conditions  reveal  that  the  simple  effects  were 
atinhutahlr  to  significant  increases  m  'same*  responses  as  complexity  increased  from  2-tone  to  3-tonc  bases  The  proltubilitics 
ol  migration  dominance  for  A  and  X  stimuli  with  increasing  complexity,  shown  m  panel  3  ot  I  able  4.  wore  calculated  (using 


Perceptual  Organisation 


9 


Appendix  Eq  5a‘)  to  increase  sulistanttally  from  2-  to  3-ionc  complexity,  but  then  to  decrease  slightly  at  the  highest  level  of 
complexity.  This  same  pattern  of  results  was  observed  for  both  A  and  X  stimuli.  Thus,  regardless  of  when  stimuli  are 
presented  on  a  trial,  increased  migration  or  dominance  with  increasing  complexity  may  asymptote  when  the  number  of  tones  is 
large  enough  to  make  it  difficult  to  perceptually  isolate  individual  tones* 

Because  they  would  indicate  either  TP  or  fusion  of  contralateral  distinguishing  tones  to  perceive  ftgurally  bad  chords, 
"same"  responses  in  the  DISSONANT-FUSE.1ST  and  DISSONANT-FUSE.2ND  Conditions  (as  well  as  in  the  NO.FUSE- 
CONTROL  Condition)  were  not  hypothesized  to  frequently  occur.  The  high  percentage  of  "same"  responses  obtained  across 
levels  of  complexity  was  unexpected,  as  was  the  similar  tendency  to  respond  "different"  on  roughly  half  of  all  trials  in  the 
NO. FUSE-CONTROL  Condition.  The  accompanying  increased  tendency  to  respond  "same"  with  increasing  complexity  on 
DISSONANT-FUSE  trials  (which  was  significant  for  the  DISSONANT-FUSE  1ST  Condition)  further  suggests  that  fusion 
increased  with  increasing  complexity.  "Same"  responses  in  the  DISS0NANT-FUSE.1ST  and  DISSONANT-FUSE.2ND 
Conditions  again  could  not  have  been  due  to  TP  unless  subjects  responded  to  the  location  of  the  triplex  percepts  rather  than 
veridical  target  ear  perception.  However,  we  already  have  noted  that  subjects  exhibited  extremely  low  probabilities  for 
responding  to  inappropriately  localized  percepts.  Subjects  therefore  were  most  likely  responding  to  target  car  rather  than  triplex 
percepts.  If  we  assume  that  TP  is  an  unlikely  alternative  to  musical  DP,  we  can  calculate  the  probability  of  fusing  contralateral 
distinguishing  tones  (using  Eq.  5b*)  to  perceive  a  dissonant  chord.  As  shown  in  panel  4  of  Table  4,  these  probabilities,  averaged 
across  A  and  X  stimuli,  increased  from  2-  to  3-tone  levels  of  base  complexity  and  remained  stable  at  the  4-tone  level  of  base 
complexity  (0.65,  0.80,  and  0.79,  respectively). 

Means  from  both  MIGRATE  and  DISSONANT  Conditions  from  Experiment  2  were  submitted  to  a  probability 
formula  (Appendix  Eq.  6)  to  calculate  the  overall  likelihood  of  altered  target  ear  perception.  Probabilities  were  calculated 
individually  for  A  and  X  stimuli  at  each  level  of  complexity  and  are  displayed  in  the  bottom  panel  of  Table  4.  Stimulus 
position  (A  or  X)  did  not  seem  to  play  a  critical  role  in  target  ear  perception  of  the  current  stimuli,  since  no  overall  systematic 
bias  for  altered  target  ear  perception  of  the  X  stimulus  was  found. 

The  obtained  probabilities  were  averaged  across  stimulus  position  to  obtain  overall  probabilities  of  altered  target  ear 
perception  for  each  level  of  complexity.  These  probabilities  were  0.68  for  2-,  0.93  for  3-,  and  0.90  for  4-lone  bases.  As 
complexity  increases,  so  docs  the  likelihood  of  altered  target  ear  perception.  The  probabilities  at  the  two  higher  levels  of 
complexity  are  comparable  to  the  accuracy  rates  for  binaural  discrimination  trials  and  are  therefore  essentially  at  ceiling. 

General  Discussion 

Having  quantified  the  probabilities  of  various  perceptual  organizations  (i.e  .  fusion,  migration,  dominance,  TP, 
distraction),  we  must  provide  an  explanation  of  not  only  why  specific  organizations  occurred,  but  also  why  their  probabilities 
changed  as  a  function  of  stimulus  complexity.  The  following  discussion  develops  general  perceptual  explanations  for  the 
increased  tendency  to  alter  target  ear  perception  with  increasing  complexity.  These  include  (1)  an  application  of  Gestalt 
piiuupies,  (2)  an  evaluation  of  the  relative  salience  of  contralateral  tones  based  on  related  speech  research,  and  (3)  a  comparison 
with  a  well-known  attentional  model  which  addresses  similar  findings  with  visual  stimuli. 

The  observed  probabilities  of  altered  target  car  perception  cannot  be  due  merely  to  subjects  comparing  the  overall 
perceived  similarity  of  stimulus  configurations.  If  responses  were  based  solely  on  the  perceived  similarity  of  A  and  X  stimuli, 
then  sufficient  increases  in  complexity  should  have  resulted  in  an  increasing  tendency  to  perceive  different  stimuli  as  similar. 
Similarity  then  should  have  affected  perception  of  binaural  and  dichotic  stimuli  in  an  equivalent  manner;  increases  in  "same- 
responses  for  dichotic  conditions  would  have  been  comparable  to  or  less  than  the  measured  decrease  in  binaural  discrimination 
performance  for  similar  stimuli.  This,  however,  was  clearly  not  the  case.  Altered  target  ear  perception  in  dichotic  trials 
approached  a  maximal  rate,  whereas  error  rates  on  binaural  different-chord  trials  remained  quite  low. 

Alternatively,  one  could  argue  that  the  effects  of  complexity  or.  altered  target  ear  perception  arc  attributable  to 
memory  decay  The  current  discrimination  task  probably  does  involve  a  strong  memory  component.  Subjects  must  compare  the 
X  stimulus  with  a  memorial  representation  of  the  A  stimulus.  This  representation  could  be  either  (1)  a  trace  (or  image)  of  A. 
or  (2)  an  encoded  version  of  A.  respectively  equivalent  to  the  product  of  "trace"  and  "context  coding"  processes  (Braida  and 
Durlach.  1986.  as  summarized  by  Macmillan,  Braida.  and  Goldberg,  1987).  Short  ISIs  and  relatively  simple  stimuli  (e.g..  with  few 
components)  arc  required  for  maintaining  an  adequate  memory  trace;  otherwise,  the  trace  will  substantially  decay.  Since  piano 
tones  rapidly  achieve  a  steady-state  spectral  composition  which  remains  relatively  stable  (apart  from  gradual  amplitude  decay) 
until  offset,  the  current  task  would  minimally  require  subjects  to  compare  spectral  properties  at  X  stimulus  onset  with  analogous 
properties  at  A  stimulus  offset.  The  minimum  memory  interval  therefore  becomes  the  1.5  s  ISI.  Since  this  interval  is 
comparable  to  estimates  of  the  upper  bound  of  echoic  memory  (e.g..  Darwin,  lurvcy.  and  Crowder.  1972;  Trcisman  and  Rostron. 
1972).  the  A  stimulus  trace  could  have  sufficiently  decayed  to  hamper  its  comparison  with  the  X  stimulus.  Trace  decay  of 
veridical  perception  thus  could  appear  as  altered  target  car  perception.  Furthermore,  due  to  an  increase  in  the  number  of 
stimulus  components,  trace  decay  should  be  more  pronounced  as  base  complexity  increases. 

Since  its  effectiveness  is  not  influenced  by  ISI.  context  coding  would  Ik*  the  more  viable  memory  strategy.  However, 
context  coding  also  should  Ik*  more  difficult  as  stimulus  complexity  increases.  Thus,  effects  of  increasing  stimulus  (liasc) 
complexity  are  predicted  not  only  by  a  greater  likelihood  of  trace  decay,  but  also  by  inappropriate  context  coding 

Implicitly,  this  memory  model  predicts  that  context  coding  should  l>c  easier  for  figurally  good,  and  thus  more  easily 
encoded  stimuli  Indeed,  figurally  good  stimuli  arc  generally  argued  to  require  levs  information  to  l*-  represented,  while  also 
iK-mg  more  resistant  to  altered  perception  Trace  decay,  reflecting  one  mechanism  of  altered  perception,  therefore  should  Ik-  less 
like  tv  i"  alien  the  ficrception  of  figurally  good  stimuli  Thus.  mcnim\ «lc< ay  nm  provide  merely  a  vaiianl  of  explanations 


I’crccplual  Organization 


10 


t 


based  upon  changes  in  figural  goodness  as  a  function  of  complexity  (as  we  originally  hypothesized ). 

Gestalt  Principles 

1-et  us  momentarily  assume  that  figural  goodness  for  chords  >s  positively  correlated  with  the  degree  of  harmonic 
consonance  produced  by  simultaneously  presented  tones.  Dowling  and  Harwood  (1986)  note  that  chord  stability  should  decrease 
with  increasing  dissonance  between  tones.  Since  the  frequencies  of  adjacent  tones  in  a  musical  scale  do  not  form  a  simple 
integer  ratio,  adjacent  tones  should  evoke  dissonance.  Dissonant  chords  are  unstable  in  tonal  music,  requiring  immediate 
resolution  to  simpler  frequency  ratios.  Dissonance  was  present  in  both  3-  and  (possibly  to  a  greater  extent  in)  4-tone  bases. 
Overall  dissonance  increases,  and  therefore  assumed  figural  goodness  decreases,  with  increasing  complexity  for  the  current  stimuli. 

Three  perceptual  trends  would  be  expected  in  the  current  experiments  if  figural  goodness  decreases  as  a  function  of 
increasing  complexity.  First,  if  bases/chords  with  many  component  tones  represent  weak  figures,  then  these  figures  should  be 
poorly  represented  by  the  listener.  Therefore,  as  stimuli  become  more  complex,  and  thus  are  represented  less  adequately,  stimuli 
should  increasingly  be  perceived  as  similar  and  also  (as  noted)  more  subject  to  memory  limitations.  Accuracy  in  binaural  chord 
discrimination  then  should  increase  with  increasing  complexity  for  same  chord  trials  and  decrease  for  different  chord  trials, 
reflecting  increased  overall  perceived  similarity  of  chords  as  more  component  tones  are  presented.  With  the  sole  exception  of 
same  chord  binaural  discrimination  trials  in  Experiment  2,  this  trend  was  observed. 

Second,  fusion  of  a  contralateral  tone  with  a  base  should  decrease  (favoring  veridical  perception)  as  complexity 
increases.  This  trend  was  observed  in  the  calculated  probabilities  of  fusion  (from  Eq.  3')  for  FUSE-EITHER  and  FUSE- 
NEITHER  Conditions  in  Experiment  1. 

Finally,  when  contralateral  distinguishing  tones  arc  presented  simultaneously  (one  of  them  physically  mixed  with  the 
base),  there  should  be  an  increasing  tendency  to  alter  target  ear  perception  with  increasing  complexity  even  though  the  target  ear 
received  a  full  (but  now  less  stable)  chord.  Again,  this  trend  was  confirmed  by  the  calculated  probabilities  of  altered  target  ear 
perception  (from  Eq.  7)  for  the  MIGRATE  and  DISSONANT-FUSE  Conditions  in  Experiment  2.  Findings  from  both 
experiments  therefore  suggest  that  figural  goodness  may  decrease  as  the  number  of  tones  (representing  unique  notes  in  a  musical 
scale)  increases.  However,  while  invoking  changes  in  figural  goodness  may  allow  a  post-hoc  explanation  of  complexity  effects,  it 
docs  not  explain  how  target  ear  perception  is  systematically  altered.  We  therefore  consider  several  general  theoretical 
alternatives. 

Stimulus  Dominance 

Altered  target  ear  perception  might  be  argued  to  reflect  stimulus  dominance  effects  rather  than  the  fusion/migration  of 
distinguishing  tones,  since  target  car  perception  would  be  the  same  under  both  conditions.  Repp  (1978a,  b.  and  c)  identified 
several  dichotic  pairs  of  CV  syllables  from  a  /ba/-/da/-/ga/  continuum  which  perfectly  fused.  Identification  of  fused  pairs 
reflected  the  perceptual  dominance  or  one  syllable  over  its  contralateral  CV,  i.e..  only  the  dominant  consonant  was  perceived 
Repp  suggested  that  such  dominance  may  result  from  differences  in  relative  amplitudes  and  frequencies  of  dichotic  components, 
with  subjects  showing  some  bias  (for  bleats  and  CV  syllables)  to  respond  on  the  basis  of  the  lower-frequency  member  of  a 
dichotic  pair. 

Repp  aTgued  that  stimulus  dominance  represents  a  lower-level,  general  auditory  phenomenon.  Dominance  therefore 
should  be  expected  for  both  speech  and  nonspeech  stimuli,  as  Repp  found  for  related  stimuli  which  differed  in  their  adherence 
to  sounding  like  speech  jin  decreasing  order  of  "speech-likeness",  two-formant  CV  syllables,  bleats  (F2).  transitions  (from  FI  and 
F2),  chirps  (F2  transitions),  and  timbres  ( F2  steady  states)].  In  the  current  study,  dominance  could  have  conceivably  occurred 
when  two  distinguishing  tones  were  presented  to  separate  ears.  AH  stimuli  (tones,  bases,  and  chords)  were  presented  at  equal 
amplitude.  As  a  result,  a  distinguishing  tone  presented  in  isolation  (to  the  contralateral  ear)  was  more  intense  than  the  same 
distinguishing  tone  mixed  with  a  base  (in  the  target  ear).  Thus,  isolated  distinguishing  tones  may  have  been  perceived  as  the 
more  salient,  or  even  the  only,  distinguishing  tones.  Furthermore,  with  increasing  complexity  the  distinguishing  tone  mixed  with  a 
base  tiecomes  less  intense  relative  to  the  contralateral  distinguishing  tone,  but  still  is  equal  in  intensity  to  the  other  tones  in  the 
chord.  Thus,  dominance  based  on  relative  intensity  also  could  predict  base  complexity  effects. 

Despite  this  possible  confound,  some  data  from  Experiment  2  are  inconsistent  with  stimulus  dominance  predictions 
Due  to  the  consistent  differences  in  component  amplitudes  and  frequencies  across  conditions  at  each  level  of  complexity,  stimulus 
dominance,  if  observed,  should  have  equally  affected  all  stimuli  in  which  contralateral  distinguishing  tones  were  presented. 
Monaural  stiftjuli  in  the  DISSONANT-FUSE  Conditions  (where  both  distinguishing  tones  were  physically  mixed  with  the  base) 
should  have  liccn  perceived  vendically  since  subjects  performed  accurately  on  binaural  trials  involving  equivalent  stimuli.  The 
remaining  stimulus  with  contralateral  distinguishing  tones  should  have  resulted  in  the  target  car  perception  of  a  major  or  minor 
chord.  Resulting  "different"  responses  therefore  should  have  increased  with  increasing  complexity,  directly  opposite  to  the 
obtained  pattern  of  results.  The  high  percentage  of  "same"  responses,  therefore,  could  not  reflect  stimulus  dominance  'I*hus.  it 
is  unlikely  that  the  observed  l»a.sc  complexity  effects  were  the  result  of  stimulus  dominance. 

Feature  Integration  Theory 

An  alternative  explanation  for  the  oliscrved  base  complexity  effects  could  include  attcnttonal  constructs  which  predict 
systematic  mislocali/ation  and  perceptual  integration  of  stimulus  components,  and  thus  fusion  of  contralateral  distinguishing  tones 
Such  a  model  of  attention,  although  not  yet  developed  in  the  auditory  literature,  has  l>ccn  developed  for  visual  stimuli  by 
I  reisman  and  her  colleagues  (Trcisman,  1982;  Trcisman  and  Gcladc.  1980.  Trcisman  and  Schmidt.  1982,  Trcisman  and 
(iormuan.  1988)  figure  1  outlines  this  model,  called  Feature  Integration  Theory  (Ml).  Individual  features  (values  along  a 
dimension,  like  color,  shape-,  or  size)  of  objects  first  are  preattentivch  processed  in  parallel  Focusing  attention  at  a  purtunl.u 
locution  then  integrates  otherwise  "free  floating"  features  in  order  to  perceive  singular  objects  I  I  I  also  predicts  that  feature-' 

•■in  In-  combined  based  upon  top-down  expectations  of  highly  ten  it  rent  patterns,  whnh  pioKibK  are  related  to  notions  ot  tigut.il 


Perceptual  Organization 


11 


» 


gcxxlncss. 


Insert  Figure  1 

One  source  of  critical  evidence  for  FIT  assertions  is  the  nature  of  perceptual  errors  and  the  circumstances  in  which 
they  arise.  Attention  becomes  overloaded  when  the  number  of  presented  items,  and  thus  the  number  of  features,  is  large.  If 
searching  an  array  of  many  items  for  an  object  based  on  a  conjunction  of  features,  then  "illusory  con j unctions" .  incorrect 
combinations  of  existing  features,  are  likely  to  result.  These  errors  are  argued  to  represent  either  random  couplings  of  features 
or  violations  of  expectations  and  have  been  shown  to  occur  far  more  frequently  than  errors  in  which  a  feature  is  conjoined  with 
one  not  present  in  the  array  (Treisman  and  Schmidt.  1982).  Furthermore,  subjects  are  not  aware  of  these  frequent  illusory 
conjunctions. 

In  order  to  apply  FIT  to  the  current  finding  of  tone  migration,  some  loose  assumptions  must  be  made  about  what 
qualify  as  musical  features.  Let  us  assume  that  our  auditory  analyzers  can  preattentively  identify  all  relevant  information  from 
the  "musical  array"  in  parallel,  i.e.,  pitches,  timbres,  durations,  intensities,  and  locations  of  individual  tones,  as  well  as  harmonic 
relational  information  between  tones.  According  to  FIT,  at  lower  levels  of  complexity  (i.e..  when  the  number  of  presented  items 
is  small),  subjects  should  be  relatively  successful  at  integrating  features  to  veridically  perceive  the  appropriate  chord  in  the  target 
ear.  However,  if  complexity  is  high  (i.e.,  given  many  components),  attention  may  become  overloaded.  Subjects  then  would 
group  components  along  shared  features  in  an  attempt  to  approximate  the  presented  stimulus  configuration.  One  shared  feature 
is  that  both  E  and  E*  distinguishing  tones  share  a  strong  harmonic  relationship  with  the  tones  in  the  base,  combining  with  the 
base  to  respectively  produce  a  major  and  minor  chord.  If  grouped  along  this  shared  feature,  mislocalization  of  E  and  E*  tones 
would  then  be  likely,  resulting  in  illusory  conjunctions  in  the  form  of  migrations. 

Fusion  of  contralateral  distinguishing  tones  could  be  considered  another  form  of  illusory  conjunction.7  If  the 
dissonance  created  by  the  simultaneous  presentation  of  adjacent  chroma  (E  and  E*)  is  coded  as  a  feature,  then  such  coding  may 
result  in  perceptual  fusion.  Since  base  tones  at  higher  levels  of  complexity  also  share  dissonance  and  increase  attention  load,  the 
likelihood  of  fusion  would  be  expected  to  increase  with  increasing  complexity. 

Illusory  conjunctions  represent  objects  which  have  been  resynthesized  from  feature  labels.  This  seems  initially 
contradictory  to  the  Gestalt  principles,  which  stress  holistic  perception.  However.  Treisman  (see  above)  has  suggested  that  with 
sufficient  experience,  particular  combinations  of  primitive  features  (e  g  .  simple,  highly  recurring  figures)  arc  processed  as 
emergent  features.  It  is  reasonable  to  assume  that  figurally  good  (e  g.,  major/minor)  chords  could  act  as  prototype  features. 
Without  this  assumption,  it  is  possible  that  the  A  stimulus  would  always  act  as  a  basis  for  search  for  the  X  stimulus.  Subjects 
then  would  have  more  frequently  altered  target  ear  perception  in  the  X  stimulus  than  in  the  A  stimulus.  However,  in  the 
absence  of  any  such  observed  tendencies,  this  alternative  seems  unlikely.  ITius,  figurally  good  chords  may  represent  emergent 
features.  If  so,  chords  would  be  less  likely  to  act  as  features  at  high  levels  of  complexity,  based  on  the  instability  of  chords 
containing  dissonances.  Altered  target  ear  perception  yet  again  would  seem  more  likely  with  increasing  complexity. 

While  initial  FIT  applications  to  effects  of  musical  complexity  are  far  from  straightforward.  FIT  can  account  for  the 
migration  and  fusion  of  contralateral  distinguishing  tones,  the  former  of  which  is  operationally  identical  to  visual  demonstrations 
of  illusory  conjunctions.  In  a  future  submission,  we  intend  to  present  results  which  more  directly  verify  the  applicability  of  FIT 
to  audition  using  tasks  similar  to  those  typically  used  in  vision.  Continued  investigations  within  the  framework  of  FIT  should 
provide  a  better  specification  of  musical  features.  Invoking  FIT  also  may  allow  many  auditory  findings  to  be  analyzed  from  a 
new  attentional  perspective. 

Applications  and  Summary  of  Findings 

The  current  study  suggests  that  research  using  DP  stimuli  can  evaluate  much  more  alxiut  auditory  processing  than 
simply  addressing  claims  for  or  against  modularity.  Fusion  incidence  can  be  used  to  identify  variables  (e.g.,  complexity)  which 
are  critical  to  perceptual  organization,  as  well  as  to  reveal  necessary  conditions  for  various  illusory  percepts  to  occur. 

The  results  represent  a  quantification  of  the  probabilities  of  various  perceptual  organizations  for  musical  stimuli  as  a 
function  of  stimulus  complexity  Data  from  several  conditions  verified  that  fusion  occurs  at  a  substantial  rate  in  musical  DP 
stimuli,  and  was  inconsistent  with  the  postulation  of  TP  by  phonetic  modularity  supporters.  Furthermore,  the  incidence  of 
nonvcridical -perception  seems  to  lie  determined  in  part  by  the  figural  goodness  of  both  the  fused  and  l>ase  percepts,  and  may 
reflect  organizational  tendencies  which  were  originally  demonstrated  in  vision.  Migration  and  fusion  of  chord  distinguishing  tones 
were  demonstrated  to  increase  as  a  function  of  increasing  complexity  Taken  in  conjunction  with  the  slight  decrease  in  fusion  of 
contralateral  base  and  tone  (to  perceive  a  major/mmor  chord)  as  a  function  of  complexity,  migration/ fusion  of  distinguishing 
tones  is  consistent  with  the  notion  that  increasing  complexity  decreases  figural  goodness.  Decreasing  figural  goodness,  therefore, 
is  argued  to  decrease  the  likelihood  of  veridical  perception  Furthermore,  migration  and  fusion  arc  consistent  with  feature 
integration  as  conjectured  by  FIT. 

The  application  of  FIT  to  the  current  results  suggests  a  generalized  attentional  basis  for  future  (DP)  studies  of 
auditory  perceptual  organization  Attentional  factors  also  may  affect  perceptual  organization  in  the  presence  of  variation  along 
other  stimulus  variables  (e  g,  component  duration,  intensity,  or  relative  frequency  position),  which  merit  further  research. 


» 


» 


» 


» 


» 


References  t 

n  •  •  M  An  experimental  study  of  the  phenomenon  of  closure  as  .i  threshold  function  Journal  ol 

[x  per imcnt.il  Psychology .  *tj  27T29J 


ft 


Perceptual  Organization 


12 


Bregman.  A.  S.  11987).  IJic  meaning  of  duplex  perception:  Sounds  and  transparent  objects.  In  M  1  H  Schoulcn 
(I'd.),  1  he  fNychonhvsics  of  Speech  Perception.  Boston:  Martmus  Nijhoff  N'Ai'O-ASI  Senes 

Bregman.  A  S.  (1990).  Auditory  scene  analysis:  'Hie  perceptual 
organization  of  sound.  Cambridge.  MA  MIT  Press. 

C  iocca .  V.  &  Bregman.  A.  S.  (1989).  The  effects  of  auditory  streaming  on  duplex  perception.  Perception  A 
Psvchophvstcs.  46(1),  39-48. 

Collins,  S.  C.  (1985).  Duplex  perception  with  mustcal  stimuli.  A  further  investigation.  Perception  A  Psychophysics. 

38.  172-177. 

Cutting.  J.  EC.  (1976).  Auditory  and  linguistic  processes  in  speech  perception:  Inferences  from  six  fusions  in  dichotic 
listening.  Psychological  Review.  83.  114-140. 

Darwin,  C.  J..  Turvey,  M.  T„  A  Crowder,  K.  (i.  (1972).  An  auditory  analogue  of  the  Sperling  partial  report 

procedure:  Evidence  for  brief  auditory  storage.  Cognitive  Psychology.  3.  255-267. 

Deutsch.  D.  (1974).  An  auditory  illusion.  Nature.  251.  307-309. 

I>owling.  W.  J.  A  Harwood.  D.  L.  (1986).  Music  Cognition.  New 
York:  Academic  Press. 

Eowlcr,  C.  A.  &  Rosenblum.  L.  D.  (1990)  Duplex  perception: 

a  comparison  of  monosyllables  and  slamming  doors,  journal 

of  Experimental  Psychology:  Human  Perception  and  Performance.  16(4).  742-754. 

Hall,  M.  D.  &  Pastorc.  R.  E.  (1992).  Musical  duplex  perception:  Perception  of  figurally  good  chords  with  subliminal 
distinguishing  tones.  Journal  of  Experimental  Psychology:  Human  Perception  and  Performance.  18(3).  752- 
762. 

l.iberman.  A.  M..  lsenberg.  D.  A  Rakerd.  B.  (1981).  Duplex  perception  of  cues  for  stop  consonants:  Evidence  for  a 
phonetic  mode.  Perception  A  Psychophysics.  30.  133-143. 

Liberman,  A  M.  A  Mattingly,  I.  Ci.  (1989a).  A  specialization  for  speech  perception  Science.  243.  489-494. 

Liberman.  A.  M  &  Mattingly.  1.  G.  (1989b).  Motor  theory  of 
speech  perception  revisited.  Cognition.  2L  1-36. 

Macmillan.  N.  A..  Braida.  L.  D.  and  Goldberg.  R.  E.  (1987).  Central  and  peripheral  processes  in  the  perception  of 

speech  and  nonspeech  sounds.  In  M.  E.  H.  Schouten  (ECd.).  The  Psychophysics  of  Speech  Perception  (pp.28- 
45).  Boston:  Martmus  Nijhoff  NATO-ASI  Senes. 

Mann.  V.  A..  Madden.  J„  Russell.  J.  M..  A  Liberman.  A.  M.  (1981).  f  urther  investigation  into  the  influence  of  preceding  liquids 
on  stop  consonant  perception.  Proceedings  of  the  101st  meeting  of  the  Acoustical  Society  of  America.  69(S1).  S91(A). 

Mattingly.  N.  A.  &  Liberman.  A.  M.  (1988).  Speech  and  other  auditory  modules.  Haskins  Laboratories  Status  Kepon 
on  Speech  Research  #SR-93/94.  67-84. 

Nusbaum.  H..  Schwab,  E..  &  Sawusch.  J.  (1983).  ’Hie  role  of  'chirp*  identification  in  duplex  perception  Perception  & 

Psych, iphysics.  33(4),  323-332 

Pastorc.  R.  EC..  Sehmuckler.  M.  A..  Rosenblum.  L..  A  Szczcsiul.  K.  (1983).  Duplex  perception  with  musical  stimuli. 

Perception  &  Psychophysics.  33.  469-474. 

Rand.  T.  C.  (1974).  Dichotic  release  from  masking  for  speech. 

Journal  of  the  Acoustical  Society  of  America.  55,  678-680. 

Repp,  B.  H  (1978a).  Stimulus  dominance  in  fused  dichotic  syllables.  Haskins  laboratories  Status  Report  on 

Speech  Research  #SR-55/56.  133-148. 

Repp.  B.  H.  (1978b).  (Categorical  perception  of  fused  dichotic  syllables.  Haskins  Laboratories  Status  Report  on 
Speech  Research  #SK-55/56.  149-161. 

Repp.  B.  H.  (1978c).  Stimulus  dominance  and  ear  dominance  in  fused  dichotic  speech  and  nonspeech  stimuli.  Ha  skins 
l^lioratones  Status  Report  on  Speech  Research  #SR-55/56.  163-179. 

Repp.  B.  H  .  Milbum,  (  .  &  Ashkcnas.  J.  (1983).  Duplex  perception:  (Confirmation  of  fusion  Perception  A 
Psychophysics.  33,  333-357 

Ireisman.  A.11982)  Perceptual  grouping  and  attention  in  visual  search  for  features  and  for  objects.  Journal  of 
Experimental  Psychology:  Human  Perception  and  Performance.  8,  194-214 

I  m.sman.  A.  (1990).  Variations  on  the  theme  of  feature  integration  Reply  to  Navon  (1990).  Psychological  Review. 

97(3).  460-463 

Ireisman.  A  &  (ielade.  (i.  (1980).  A  feature-integration  theory  ol  attention.  (Cognitive  l*svchology.  12.  97-136 

Ireisman,  A.  A  Schmidt.  H.  (1982)  Illusory  conjunctions  in  ih<  perception  of  objects.  Cognitive  Psychology.  M. 

107-141 

Ireisman.  A  A  (iormican.  S  (1988)  f  eature  analysis  in  early  evidence  from  search  asymmetries 

Psychological  Review.  95(1).  15-48. 

Ireisman.  M  A  Rostron.  A.  B.  (1972)  Brief  auditory  storage  A  uuxtification  ol  Sperling's  paradigm  applied  to 
audition  Acta  Psyrhologica  36.  161-170. 

Wertheimer  M  (I95S)  Principles  of  perceptual  organization  In  I)  (  Bcardslee  A  M  Wertheimer  (Ids)  Readings 
in  Peocpuon  Princeton  Van  Nostrand  Company.  1m 

Wh.ibn  I)  \  1  ilK’rm.m  A  \1  (198^)  Speech  [X’ncphon  lakes 

pn  it  »|rii(«-  <i\cf  m>nsjwr<  b  |vu<pt|.,i,  S«  ty?<  <■  3  37.  |<»<>  l'| 


» 


» 


» 


» 


Perceptual  Organization 


13 


Wot  xl  worth.  R  S.  Sohloslierg.  H.  (1954).  Experimental  psychology.  New  York:  Holl. 


Appendix:  Probabilities  for  Altered  Target  Ear  Perception 
Probabilities  for  altered  target  car  perception  in  both  experiments  were  calculated  from  several  formulae.  Formulae 
were  simplified  by  a  few  basic  assumptions  which  arc  specified  for  the  given  conditions.  This  appendix  provides  the  derivation 
of  formulae  used  for  each  expenment  and  the  probabilities  estimated  by  inserting  the  actual  results  into  th^se  formupe. 

Symbols  used  in  formulae  arc  summarized  in  Table  6.  Each  symbol  reflects  the  probability  of  a  given  perception  or  response. 
The  probability  formulae,  shown  in  Table  7.  are  discussed  below. 

Each  probability  was  modified  by  corresponding  accurac)  /error  rates  on  binaural  chord  trials,  which  represented  a 
correction  procedure  for  guessing  (Woodworth  &  Schlosberg,  1954).  In  tSe  formo.av.  a  and  b  are  the  accuracy  rates  under 
physically  same  and  different  trials.  These  rates  are  set  equal  to  the  binaural  accuracy  rates  for  the  given  level  of  base 
complexity  Ua  and  14>  are  the  corresponding  binaural  error  rates  under  physically  same  and  different  trials.  For  each  formula. 
3  calculations  can  be  performed,  one  for  each  level  of  base  complexity. 


Insert  Table  6 


F^xpenment  1 

'Same*  responses  in  the  FUSE-EITHER  1  lition  reflected  correct  responses,  a.  to  fusion  of  ne.'her  trial  stimulus, 
Jrm,  or  to  fusion  of  both  stimuli.  F..  as  well  as  an  ei.«  .  responses,  1-b.  to  fusion  of  only  one  stimulus.  l-^F.  +  F,).  This 
probability  is  expressed  in  Eq  1.  with  terms  collected  to  obtain  Eq.  lb.  Eq.  1  has  two  unknown  variables,  the  proliabiiily  of 
fusing  both  stimuli  (F.J  and  the  probability  of  fusing  neither  stimulus  (JF«)-  "Same"  responses  to  the  FUSE-NEITHER 
Condition  reflect  a  correct  response  to  fusing  neither  stimulus  or  an  incorrect  response  to  all  other  perceptual  events,  giving  rise 
to  Eq  2.  which  has  one  unknown  variable,  ,F..  The  right  term  in  F^q.  2  is  equal  to  *!.e  bracketed  term  in  Eq.  lb.  Since 
stimuli  in  the  Fl'SE-KITHER  and  FUSE-NEITHER  Conditions  are  similar  in  all  respects  except  for  the  isolated  distinguishing 
tone  in  the  X  stimulus,  it  is  assumed  that  the  probability  of  fusing  neither  trial  stimulus  occurs  at  an  cqjal  rue  for  lioth 
conditions.  Fq  3  is  obtained  by  substituting  the  directly  measured  probability,  (s  |  FUSE-NEITHER).  for  the  bracketed  term  in 
Eq  lb.  TTus  formula,  rewntt'n  as  Eq.  3\  solves  for  F_.  the  probability  of  fusing  both  trial  stimuli  in  the  two  conditions  to 
perceive  complete  chords  in  the  target  car  This  probability.  F„,  is  .5.’,  .42.  and  .44  respectively  for  two.  three-,  and  four-tone 
bases  If  we  assume  that  fusion  of  the  A  or  X  stimulus  in  either  condition  are  independen*  and  equally  probable  events,  the 
probability  of  fusing  a  single  stimulus  should  be  equivalent  to  the  square  root  of  the  probability  of  fusing  both  stimuli,  or  .75, 
.65.  and  .66  as  a  function  of  increasing  base  complexity 


Insert  Table  7 


The  probability  of  subjects  responding  "same"  to  target  ear  components  for  the  FUSE  1  C  ondition,  expressed  in  Eq  4. 
depends  on  a  correct  response  to  the  combination  of  veridical  target  ear  perception  for  the  A  stimulus  and  fusing  the  X  stimulus 
to  pcrcc""*  a  complete  chord  in  the  target  ea.  (i.c..  E,*VJ.  plus  an  erroneous  response  to  all  other  conditions.  By  solving  for 
F. « V.  as  a  single  term.  Eq.  4  permitted  solution  for  the  minimum  estimated  probabilities  of  veridical  A  stimulus  perception.  V.. 
and  fusing  the  X  stimulus.  Fr  in  the  FUSE-1  Condition  This  minimum  incidence  of  V,  and  F,  was  estimated  to  be  .21.  .28,  and 
.55  as  a  function  of  increasing  complexity  Because  the  physical  stimuli  used  as  A  and  X  in  this  condition  arc  not  equivdicr.*. 
probabilities  of  any  perceptual  organization  arc  not  transferable  across  stimuli. 

Experiment  2 

Probabilities  of  migrating  and  fusing  contralateral  distinguishing  tones  were  estimated  respectively  by  F.qs.  5a  and  5b. 
Itiesc  probabilities  wre  individually  calculated  for  A  (shown)  and  X  stimuli  Formulae  for  X  stimuli  can  be  obtained  by 
substituting  MKiRATI  -2ND  and  DISSONAN T-EUSl  2ND  for  MIGRATE- 1 ST  and  DISSONANT -FT SE  1ST  C  onditions, 
respectively,  as  well  as  *,*  for  syrnoois  and  vkc  versa 

The  stimulus  in  these  conditions  (X  for  the  displayed  equations)  that  was  presented  monaurally  was  assumed  to  always 
l>c  perceived  v  djrally  |abovo  V,  ])  Substituting  )  for  V.  in  I  qs  5a  and  5b  results  in  collapsed  I  qs  5a*  and  5b*  As  a 
result,  the  calculated  probabilities  for  migrating  contralateral  distinguishing  tours  (using  I  q  5a  )  as  a  function  of  increasing 
complexity  were  03,  62.  and  59  for  the  A  stimulus,  and  .16.  .68.  and  48  fox  the  X  stimulus  similarU,  the  probabilities  for 
fusing  contralateral  distinguishing  tones  (using  Eq.  5b*)  as  a  function  of  increasing  complexity  were  .65.  67.  and  .88  for  the  A 
stimulus,  and  65.  93.  and  70  for  the  X  stimulus. 

Finally  the  probability  of  cither  fusing  or  migratin'1  distinguishing  tones,  i.c  the  likelihood  ol  altering  target  car 
perception,  was  individually  calculated  using  lu|.  6  for  each  given  stimulus  (A  or  X.  shown  for  A)  Calculations  for  X  stimuli 
can  U*  obtained  by  substituting  *,*  for  in  the  equation  By  sulistituting  the  probabilities  of  fusing  and  migrating  contralateral 
distinguishing  tones  obtained  from  E«c  Sa*  and  5b*  into  I \q  6.  the  following  probabilities  of  altervd  base  e.ir  perception  thus 
were  generate*!  rcspeelivefv  for  two-,  three-,  and  four-tom  liases  for  the  A  stimulus.  66.  XV  and  95.  far  the  X  stimulus.  7) 

9s  and  84  averaged  across  A  and  X  stimuli.  6H  91  ami  91 1 


Perceptual  Organization 


14 


Acknowledgments 

Based  upon  work  supported  by  National  Science  Foundation  Grant  BN S89 11456  and  Cirants  1496209310033  and 
F49609310327  from  the  Air  Force  Office  of  Scientific  Research.  Opinions,  findings,  conclusions,  and  recommendations  arc  the 
authors’  and  do  not  necessarily  reflect  views  of  the  granting  agencies.  We  gratefully  acknowledge  the  following  colleagues  for 
their  comments  and  suggestions  on  drafts  of  the  manuscript:  Richard  Fahey,  Xiao-Feng  Li.  Dawn  G.  Blasko,  Wenyi  Huang,  and 
Jennifer  L.  Cho.  Requests  for  reprints  should  be  sent  to  either  Michael  D.  Hall  or  Richard  F.  Pastore  at  the  Department  of 
Psychology.  State  University  of  New  York  at  Binghamton,  Binghamton.  New  York,  13902-6000. 


Endnotes 

1  Collins  (1985)  found  a  reduced  rate  of  musical  DP  when  contralateral  components  were  asynchronous  by  125  ms,  whereas 
smaller  asynchronies  between  chirp  and  base  in  DPS  result  in  segregation  (Cutting,  1976).  The  Collins  findings  could  be  argued 
to  reflect  the  reliance  on  different  integration  cues  in  musical  and  speech  DP.  In  speech  DP.  distinguishing  transitions  terminate 
with  the  onset  of  the  corresponding  formant  in  the  base.  Fusion  is  primarily  based  upon  the  good  continuation  of  frequency  as 
a  function  of  time  existing  between  transition  and  its  corresponding  formant.  DPS,  therefore,  should  l>c  disrupted  with 
component  stimulus  asynchrony.  In  musical  DP.  distinguishing  tones  are  normally  presented  for  the  duration  of  the  base. 

Fusion  is  based  upon  the  relationship  between  component  frequencies,  and  DP.  therefore,  should  be  less  affected  by  small 
degrees  of  component  asynchrony.  It  alternatively  could  be  argued  that  these  findings  reflect  differences  in  the  use  of 
information  by  speech  and  nonspeech  systems. 

2.  llie  issue  of  the  predicted  role  of  figural  goodness  is  actually  more  complicated  than  has  been  expressed.  Clearly  the  fusion 
rate  depends  upon  the  difference  in  degree  of  closure  for  the  base  and  fused  stimuli.  Thus,  if  increasing  complexity  decreases 
articulation  of  both  base  and  fused  stimuli,  but  more  so  for  the  base,  then  it  is  still  possible  for  the  rate  of  fusion  to  increase. 
Our  current  interest  is  in  the  determination  of  the  nature  of  the  relationship  between  stimulus  complexity  and  rale  of  altered 
perception  The  issue  of  figural  goodness  then  will  lie  indirectly  addressed  on  the  basis  of  the  obtained  pattern  of  results. 

3  Data  from  an  additional  21  subjects  were  discarded  because  they  did  not  meet  the  a  prion  performance  criterion.  The  large 
dropout  rate  ts  generally  attributed  to  the  extremely  limited  musical  experience  of  the  original  subjects  who  were  self-scicctcd 
and  often  failed  to  read  the  minimal  musical  experience  criteria  for  participation,  although  clearly  there  also  was  an  occasional 
inattentive  subject 

4  It  is  now  generally  accepted  that  the  syllabic  percept  in  DPS  is  the  result  of  a  perceptual  fusion  of  chirp  and  base  (Repp. 
Milburn.  and  Ashkenas.  1983)  rather  than  a  cognitive  integration  of  the  separately  perceived  and  presumably  categorized 
components  (Nusbaum.  Schwab,  and  Sawusch.  1983).  The  argument  in  favor  of  perceptual  fusion  provides  a  strong  basis  for 
assuming  that  the  probability  of  responses  should  lie  equal  for  physically  mixed  (binaural)  and  perceptually  mixed  (fused)  stimuli 

5.  There  was  again  a  high  drop-out  rate  (eighteen  subjects)  due  to  the  inability  of  subjects  to  accurately  discriminate  binaural 
major  and  minor  chords  and  thus  meet  our  a  priori  criteria  for  participation. 

6  Jj  rould  be  argued  that  the  consistently  obtained  maximum  probability  of  altered  target  ear  perception  (via  migration  or 
fusion  of  contralateral  distinguishing  tones)  for  3-tonc.  rather  than  4-tone  base  stimuli,  reflects  minimal  figural  goodness  for  3- 
tonc  base  stimuli.  In  the  current  major/minor  sixth  chords,  the  two  highest  frequency  components  arc  not  widely  separated  in 
frequency  in  relation  to  component  tones  for  either  2-  or  4-tone  bases.  As  a  result,  overall  perceived  dissonance  for  3-tonc 
bases  may  actually  exceed  that  for  4-tonc  liases,  possibly  making  altered  target  car  perception  more  likely  despite  the  reduced 
level  of  base  complexity 


Althr .Hah  fusion 


Of*  could  Ik-  considered  to  lie  an  example  of  an  illusory  conjunction,  our  current  fo cus  is  more  narrow 


Perceptual  Organization 


15 


Anticipated  Perceptual  Structures  for  Stimuli 


Stimuli 

t.e  ||  c.c. 

Fusion 

t.e.  ||  c.c. 

Migration 

t.e.  i|  c.c. 

Dominance 

t.c.  ||  c.e. 

Triplex 

Perception 
t.e.  |  between  |  c.e. 

• 

C-G  II  E 

C-E-G  ||  E 

C-G  |  (C-E-G  |  |  E 

C-G  ||  E* 

C-E*-G  II  E' 

- 

- 

C-G  |  jc-E'-G]  |  E* 

C-E-G  ||  E‘ 

CE'-E-G  ||  E* 

C-E‘-G  ||  E 

c-e'-g  ||  t* 

C-E-G  |  [C-E*-E-G|  |  E* 

• 

C-E*-G  ||  E 

C-E'-E-G  ||  E 

C-E-G  ||  E‘ 

C-E-G  ||  E 

C-K'-G  |  [C-E'-E-G]  |  E 

•  E  ||  C-G 

E  ||  C-E-G 

- 

_ 

E  |  |C-E-GJ  |  C-G 

•  E‘  ||  C-Ci 

E‘  II  c-e'-g 

- 

- 

E*  |  |C-E‘-G|  |  C-G 

'C-E'-E-G  || 

- 

- 

- 

•  II  c-em.-g 

- 

- 

- 

- 

• 

•Additional  Experiment  2  stimuli. 


Table  1.  Possible  perceptual  organizations  for  two-tone  base  stimuli  presented  in  dichotic  discrimination  trials.  ITiree-and  four- 
lone  bases  can  be  attained  by  respectively  adding  "A*  and  "B-D"  to  the  C-G  base  shown.  Double  lines  represent  a  midpoint 
between  the  two  ear  locations.  Target  ear  (t.e.)  stimuli  are  displayed  to  the  left  of  lines;  contralateral  car  (c.e.)  stimuli  are 
displayed  to  the  right. 


Possible  Responses  and  Perceptual  Organizations  for  Dichotic  Conditions 

(Condition  Stimuli  Predicted  Responses  [same  (s)/  different  (d)j 

A  Slim.  X  Stim.  Veridical/  Fusion/  Migration/  Distraction 
TP  in  t.e.  triplex  Dominance  to  c.e. 
percept 

t.e.  ||  c.e.  t.e.  J|  c.e.  AX  Both  AX  Both 


Experiment  1 


FI  SE-E1TIIER 

C-G  II  E 

c-g  ||  i  : 

s 

d  d  s 

.  .  . 

s 

FlSl-NKITHFR 

c-g  ||  i  : 

C-(,  ||  E‘ 

s 

ddd 

d 

M  SI  -1 

C-I  -G  ||  E- 

C-G  ||  E 

d 

d  s  d 

d  •  • 

d 

ei  si  -Bo  rn 

C-E-G  ||  1* 

C-E'-G  ||  1 

d 

d  d  s 

s  s  d 

cl 

1  xpenment  2 

MIGRATF-IST 

C-E*-G  ||  1 

C-E-G  || 

d 

d  -  - 

s  -  - 

d 

MIGRAIE-2N1) 

C-E-G  || 

c-i:*-g  II  i 

d 

-  d  - 

-  s  - 

d 

DISSONAV1H  SI  IS  1 

C-E-G  II  r 

(  i:'-i;-g  II 

d 

s  -  - 

d  - 

d 

IMSSONAN  l-K  SI  2ND 

C-E'-l  .-G  || 

c-i  :-g  ||  i ' 

d 

-  s  - 

-  d  - 

cl 

DISI  RACI-NOU  SI 

i  :  II  c-g 

E'  ||  C-G 

d 

ddd 

.  .  . 

s 

mSIRACI-l-l  SI  2ND 

II  C-E'-E-G 

1'  II  C-I  -G 

d 

-  d  - 

-  d  - 

s  (w/fusc  X) 

DISIRAC  l-< OMROI 

1  II  C-G 

1  II  C-G 

s 

s  s  s 

.  .  . 

s 

NO  1  l  SI  -(  OMROI 

C-I  -G  ||  1' 

C-I-G  || 

s 

d  -  - 

d  -  - 

d 

I  able  2  Dichonc  conditions  for  l>olh  experiments  and  possible  corresponding  responses  given  various  perceptual  organizations  of 
the  stimuli.  Again,  stimuli  arc  displayed  only  for  two-tone  bases  and  one  combination  of  distinguishing  tones 


» 


» 


» 


I 


i 


Perceptual  Organization 


16 


Discrimination  Performance  (Percent  Correct) 


Type  of  Trial  Condition 

Base  Complexity 

2-tonc  3-tone  4-tone 

Same  tone 

100.0  (0) 

Different  tone 

100.0  (0) 

Binaural 

Same  chord 

89.8 

92.6 

97.0 

(36) 

(24) 

(3.0) 

Different  chord 

92.2 

90.9 

81.8 

(2.8) 

(3.2) 

(46) 

(Percent  ' 

"Same" 

Responses) 

FUSF.-EITHF.R 

98.2 

97.7 

97.3 

(1.0) 

(1.8) 

(2.3) 

fuse-nf.itiier 

51.7 

62.6 

62.8 

Dichotic 

(10.0) 

(7.3) 

(7.4) 

FUSE-1 

25.3 

32.8 

61.4 

(7.6) 

(6.1) 

(6.2) 

FUSE-BOTH 

19.7 

50.0 

66.1 

(4.5) 

(7.5) 

(8.9) 

Table  3  Discrimination  performance  for  binaural  (percent  correct)  and  dichotic  trials  (percent  "same"  responses)  for  subjects  in 
Experiment  1.  Standard  errors  arc  shown  in  parentheses. 


Table  4  Calculated  probabilities  of  fusion  and  migration  for  Experiments  1  and  2. 


Probability 

Base  Complexity  (#  tones) 

2  3  4 

Equation 

Stimuli:  Base  | 

Distinguishing  Tone  (e  g.. 

C-G  | 

K) 

EXPERIMENT  1: 

Fuse  A  and  X 

.57 

.42 

.44 

3' 

Fuse  A  or  X 

.75 

.65 

.66 

3' 

Stimuli:  Base  + 

lone  |  Other 

Tone  (e  g  . 

C-l-G 

1  I") 

EXPERIMENT  1 

Veridical  Perception 

21 

.28 

.55 

4 

(minimum  estimate) 

EXPERIMENT  2: 

Migrate  A 

03 

.62 

59 

6a 

Migrate  X 

.16 

.68 

48 

6a 

Mean 

.10 

.65 

54 

Fuse  A 

65 

.67 

.88 

6b 

Fuse  X 

.65 

.93 

.7(1 

6b 

Mean 

65 

.80 

79 

Migrate  or  fuse  A 

06 

KS 

95 

7 

Migrate  or  I  use  X 

71 

98 

SI 

7 

Mean 

09 

9  > 

90 

» 


Perceptual  Organization 


17 


Discrimination  (Percent  Correct) 


Type  of  T  rial  Condition 

Base  Complexity 

2-tone  3-tone  4-tone 

Same  lone 

100.0  (0) 

Different  tone 

100.0  (0) 

Same  chord 

91.7 

95.0 

863 

Binaural 

(4.0) 

(3-4) 

(6.1) 

Different  chord 

94.2 

96.7 

83.2 

(2.6) 

(2.1) 

(80) 

Same  dissonant 

93.3 

87.8 

93.3 

chord 

(6.7) 

(5.8) 

(67) 

Different  dissonant 

93.6 

86.7 

7S.0 

chord 

(4.7) 

(3.0) 

(4.3) 

(Percent  "Same"  Responses) 

MIGRATF-1ST 

9.0 

59.7 

65.0 

(3.7) 

(8.8) 

(13.9) 

MIC.RATF-2N0 

20.6 

63.8 

57.9 

(5.1) 

(58) 

(12.6) 

DISTRACT-NO.FUSF 

1.8 

4.1 

3.7 

(18) 

(2.3) 

(3.7) 

D1STRACT-FUSF.2ND 

9.8 

4.5 

63 

Dichotic 

(7.0) 

(2.5) 

(4  1) 

DISS0NANT-FUSF.1ST 

62.6 

63.2 

850 

(10.3) 

(7.6) 

(69) 

DISS0NANT-FUSE.2ND 

63  1 

82.7 

72.5 

(7.0) 

(3.3) 

(8.1) 

DISTRAtri'-CONTROI. 

97.9  (1.3) 

NOFUSF. -CONTROL 

50.7  (6.0) 

Table  5.  Discrimination  performance  for  binaural  (percent  correct)  and  dichotic  trials  (percent  "same"  responses)  for  subjects  in 
Experiment  2.  Standard  errors  are  shown  in  parentheses. 


Perceptual  Organization 


Table  6.  Symbols  uj>cd  in  probability  formulae. 


Key  to  Equation  Symbols 


Probability  Symbol 

F 

F. 

F. 

F. 

.F 

,F. 

,F, 

,F. 

M 

,M 

V 


(Probability  of)  Event 

fusion 

fusion  of  A  stimulus* 
fusion  of  X  stimulus 
fuse  both  A  and  X 
no  fusion 
no  fusion  of  A 
no  fusion  of  X 
fuse  neither 

migration  (or  dominance) 
no  migration 
veridical  perception 


s  |  Condition 
d  |  Condition 
a 
b 


same  response  |  Condition 
different  response  |  Condition 
binaural  accuracy,  identical  chord  trials 
binaural  accuracy,  different  chord  trials 


•Subscripts  for  A  and  X  stimuli  also  apply  to  migration  and  veridical  perception. 


Table  7.  Probability  formulae  for  determining  the  likelihood  of  various  perceptual  organizations  for  the  given  conditions  in 
Experiments  1  and  2. 

Equation  Formulae 

1  (s  |  FUSE-EITI4ER)  =  a(.F.  +  F.)  +  (l-b)|l  -  (.F.  +  F.)| 

lb  (s  |  FUSE-EITHER)  =  |a«.F.  ♦  (l-b)(l-.Fj|  +  (a  +  b-l).F. 

2  (s  |  FUSE-NE  ITHER)  -  a*.I  +  ( 1  -b)(  1  -  ,F.) 

3  (s  |  FUSE-EITHER)  =  (s  |  I  USE-NE1TI1ER)  4  (a  +  b-l).F. 

3'  F.  =  <s  |  FUSE-FIT HER  -  s  |  FUSE-NEH  HER)/(a  +  b-l) 

4  (s  |  FUSE-1)  =  a.F.#V.  4  (l-b)(l-  F.-V.) 

5a  (s  |  MIGRATE-1ST)  =  a-M.-V,  4  (l-b)(l  -  M..V.) 

5b  (s  |  DISSONANT-FUSE.  1ST)  =  a.l>V,  4  (l-b)(l  -  F..V.) 

5a'  (s  |  MKiRATE-lST)  =  a-Nl.  4  (l-b)(l  -  M.) 

51)-  (s  |  DISSONANT-FUSE.  1ST)  =  a*!.  4  (l-b)(l  -  F.) 

6  (F.or  MJ  -  F.  4  M.  -  F..M, 


Figure  Caption 

Figure  1  Proposed  processes  leading  to  object  identification  according  to  Feature  Integration  ITicory.  including  preutlentivc 
independent  feature  attraction  and  attentive  conjoining  of  features 


Feature  Integration  Theory 


(Location  Map) 


Mapping  Percepts  in  the  Major  Variant  of  the  Octave  Illusion 
Wenyi  Huang.  Michael  D.  Hall,  &  Richard  K.  Pastore 
Center  for  Cognitive  and  Psycholinguistic  Sciences 
Stale  University  of  New  York  at  Binghamton 
Binghamton,  NY  13902-6000 

Abstract 

The  current  study  investigated  stimulus  and  perceptual  factors  critical  to  the  commonly  perceived  variant  of  the  octave 
illusion  (e.g.,  Deutsch,  1974a).  In  contrast  to  the  more  typical  illusion  pattern  the  fused  percept  shifts  only  slightly  in  pitch  with 
the  corresponding  shift  in  lateralization.  The  perceptual  characteristics  of  this  version  of  the  illusion  reflected  properties  of 
dichotic  fusion,  rather  than  the  effects  of  some  sequential  characteristics  of  the  stimuli.  Perception  of  the  illusion  remained 
stable  despite  large  variation  in  ISI  (100-2200  ms)  between  dichotic  pairs.  Additionally,  both  the  lateralization  tendency  and  the 
pitches  of  fused  percepts  were  generally  not  affected  by  change  in  the  length  of  the  sequence  (2-,  4-,  and  12-pair  sequences); 
these  tendencies  also  were  observed  with  a  single  isolated  pair  of  dichotic  tones.  Furthermore,  mappings  of  the  perceived 
illusory  pitches  consistently  corresponded  to  other  than  the  frequency  presented  to  either  ear,  and  seem  to  reflect  a  specific 
weighted  averaging  of  component  stimuli.  The  possible  higher  levels  of  processing  and  individual  differences  in  the  perception 
of  this  variant  of  the  illusion  also  are  discussed. 


The  octave  illusion  (Deutsch,  1974a)  is  an  example  of  perceptual  errors  whose  delineation  may  help  us  understand 
certain  perceptual  processes  (e  g  how  dichotic  information  is  processed).  The  physical  condition  for  the  illusion  involves  the 
dichotic  presentation  of  two  tones  separated  in  frequency  by  an  octave  (typically  400  and  800  Hz);  the  tones  are  rapidly 
alternated  in  each  ear.  such  that  when  one  car  receives  the  low  (400  Hz)  tone,  the  other  ear  simultaneously  receives  the  high 
(800  Hz)  tone.  This  physical  condition  is  illustrated  in  the  left  column  of  Figure  1. 


Insert  Figure  1  about  here 


According  to  the  original  report  by  Deutsch  (1974a),  several  illusory  patterns  were  perceived.  The  most  often 
perceived  pattern,  reported  by  58%  of  right-handed  listeners,  was  of  "a  single  tone  oscillating  from  ear  to  ear.  whose  pitch  also 
oscillated  from  one  octave  to  the  other  in  synchrony  with  the  localization  shift"  (p.  308).  The  perceived  pitches  of  this  pattern 
were  verified  by  two  subjects  with  absolute  pitch,  who  identified  the  oscillating  pitches  as  G«  (392  Hz)  and  G,  (784  Hz).  This 
perceptual  condition,  illustrated  in  the  right  column  of  Figure  1.  is  what  is  called  to  mind  when  one  mentions  the  octave  illusion, 
and  is  the  form  of  the  illusion  most  typically  investigated  (see  below).  The  second  most  commonly  perceived  illusion  pattern 
reported  by  25%  of  right-handed  listeners  was  "of  a  single  tone  oscillating  from  ear  to  ear.  whose  pitch  either  remains  constant 
or  shifts  very  slightly"  (p.  308). 

l^ter  studies  by  Deutsch  and  colleagues  focused  on  only  the  most  typical  illusory  percept,  studying  factors  that  are 
potentially  critical  to  the  incidence  of  the  illusion  (e.g.,  Deutsch  1974a.  1974b,  1975a.  1975b,  1976.  1978a.  1978b.  1981.  1988; 
Deutsch  and  Roll  1976).  For  example.  Deutsch  (1976)  reported  a  significant  tendency  for  right-handed  listeners  to  hear  a 
sequence  in  which  pitch  corresponded  to  the  stimulus  delivered  to  the  right  ear  (i.e.,  a  "right  ear  dominance")  and  latcralized  to 
the  ear  receiving  the  higher  frequency  (called  lateralization-by-frequency  effect,  also  see  Deutsch  1983).  Based  upon  evidence  of 
ear  dominance  and  lateralization-by-frequcncy.  Deutsch  and  Roll  (1976)  hypothesized  that  the  illusion  involves  two  different 
mechanisms.  A  "where"  mechanism  localizes  the  fused  tone  in  the  ear  receiving  the  higher  frequency  input  A  "what" 
mechanism  determines  the  perceived  pitch  of  that  tone  based  upon  the  frequency  presented  to  the  dominant  ear  (for  most  right¬ 
handers.  the  right  car).  In  Deutsch  (1980).  subjects  were  selected  on  the  basis  of  hearing  the  typical  illusion  pattern  (as  shown 
in  Figure  l);  subjects  reported  hearing  an  octave  difference  between  successive  tones.  Furthermore,  when  asked  to  match  the 
successive  pitches  in  the  illusion  sequence  with  those  of  a  single  binaural  sequence,  the  matches  all  approximated  a  succession  of 
tones  that  were  spaced  an  octave  apart  (I>cutsch.  1983). 

The  !ateralization-by-frequcncy  effect  has  been  studied  under  a  number  of  different  stimulus  conditions  for  the  group 
of  listeners  exhibiting  the  most  common  perceptual  form  of  the  illusion.  lateralization  in  the  illusion  appears  to  not  normally 
depend  on  toudness  differences  between  dichotic  components.  In  fact,  for  some  subjects,  lateralization  still  occurred  when  the 
800  Hz  tone  was  substantially  lower  in  amplitude  than  the  400  Hz  tone.  However,  lateralization  does  depend  on  sequential 
relationships  inherent  in  the  illusion  stimuli.  For  example,  an  illusion  sequence  of  sufficient  length  must  be  presented  to  observe 
a  strong  lateralization-by-frcqucncy.  which  is  substantially  weaker  given  2  as  opposed  to  20  dichotic  pairs  (Deutsch  1983) 
Furthermore,  the  strength  of  this  effect  was  observed  to  decrease  as  a  function  of  increasing  time  between  onsets  of  successive 
dichotic  pairs  (e.g..  intcrstimulus  interval,  see  !>eutsch.  1982).  Deutsch  (1978.  1981)  also  demonstrated  that  lateralization  to  the 
higher  frequency  signal  occurred  even  when  the  lower  frequency  signal  was  more  than  12  dB  greater  in  amplitude. 

Sequential  interactions  in  the  stimulus  sequence  also  are  critical  to  the  o!>servance  of  ear  dominance  effects  in  the 
more  typical  form  of  the  illusion  (Deutsch  1980).  Specifically,  the  size  of  the  car  dominance  is  reduced  when  the  alternating 
presentation  of  the  octave  component  stimuli  is  disrupted  by  inserting  either  a  binaural  599  Hz  tone  or  a  dichotic  pair  of 
different  (non-oclavc)  component  frequencies  between  dichotic  400  and  800  Hz  pairs  I  mail),  subjects  in  these  studies  often 
were  selected  on  the  basis  of  exhibiting  a  strong  pitch  memory  111118.  it  is  possible  that  Deutsch  may  have  unintentionally 
selected  subjects  who  perceived  the  illusion  in  a  unified  manner  (mayl>c  the  most  common  manner).  We  will  return  to  this  issue 


Octave  Illusion 


2 


shortly  below. 

lne  Purpose  of  the  Current  Study 

While  (as  noted  above)  factors  critical  to  the  illusion  have  been  studied  extensively  for  listeners  who  perceive  a  single 
tone  with  an  octave  difference  in  pitch  (the  most  commonly  reported  pattern  in  the  original  study),  these  factors  (including 
sequential  relationships)  have  not  been  studied  for  the  many  listeners  who  perceive  a  slight  (or  no)  pitch  shift  (the  second  most 
commonly  perceived  pattern  for  right-handed  listeners).  As  an  unfortunate  result,  explanation  for  the  vast  perceptual  difference 
between  listeners  given  the  illusion  sequence  are  currently  not  available. 

We  initially  set-up  a  laboratory  demonstration  (described  in  more  detail  below)  in  which  the  physical  conditions  for  the 
illusion  were  followed  by  the  physical  conditions  which  matched  that  described  for  the  most  common  form  of  the  illusion  (see 
Figure  2).  Based  on  the  informal  listening  among  our  laboratory  staff  and  among  visitors  to  the  laboratory,  we  found  two  basic 
patterns  of  perception  which  could  easily  be  judged  when  using  the  context  of  the  physical  match  to  expected  perception  as  a 
standard  or  reference.  Listeners  with  extensive  musical  experience  (as  with  Deutsch  (1980)  listeners  who  exhibited  strong  pitch 
memory)  tended  to  hear  a  sequence  of  a  pair  of  octave  related  stimuli  which  switched  from  ear-to-car,  and  a  faint  stimulus 
which  was  more  centrally  localized  and  which  seemed  to  shift  slightly  in  pitch.  Most  listeners  without  musical  training,  instead, 
reported  hearing  a  unified  (probably  complex)  stimulus  which  appeared  to  shift  somewhat  toward  each  ear  and  with  a  small  pitch 
shift  This  latter  pattern  of  illusion  perception  corresponds  to  the  "single  pitch"  group  originally  identified  by  Dcutsch  (1974a). 
The  current  study  focuses  on  this  population  and  perceptual  pattern.  Specifically,  we  studied  the  effect  of  sequential 
relationships  on  both  the  latcralization-by-frequency  effect  and  the  perceived  illusion  pitches  for  this  particular  group  of  listeners. 
In  mapping  out  a  reasonable  approximation  of  illusion  perception  for  this  alternative  group  of  listeners,  the  current  study 
provides  the  basis  for  a  dear  description,  and  potential  explanations,  of  individual  differences  assodated  with  the  illusion. 

It  is  noteworthy  that  the  unusual  and  unexpected  pattern  of  pitch  results  is  characteristic  of  one  type  of  listeners,  and 
not  of  other  types  of  listeners.  As  a  result,  we  will  not  argue  that  the  perceived  pitches  reported  here  primarily  reflect 
peripheral  mechanisms  for  frequency  encoding  (which  are  readily  addressed  by  existing  pitch  models).  Rather,  we  will  argue  that 
the  difference  in  reported  pitches  across  listeners  in  the  original  (Dcutsch.  1974a)  study  probably  reflect  a  difference  in  listening 
strategies  which  may  vary  as  a  function  of  musical  experience  and  proficiency  in  processing  acoustic  stimuli. 

Experiment  1 

Deutsch  reported  that  laieralization-by-frequency  decreased  with  increasing  time  between  onset  of  the  identical 
frequencies  at  the  two  location  (Deutsch.  1982,  p.  114).  Deutsch  also  has  reported  that  "the  durations  of  the  tones  themselves 
do  not  appear  of  importance  and  neither  does  the  time  interval  between  the  offset  of  one  tone  and  the  onset  of  its  successor’ 
(Deutsch.  1980.  p.  585).  Thus,  sequential  characteristics  affect  incidence  of  the  octave  illusion. 

Experiment  1  sought  to  replicate  Deutsch’s  initial  findings  for  the  alternative  group  of  subjects  by  systematically 
manipulating  ISI  in  the  octave  illusion.  This  initial  experiment  used  a  structured  self-report  procedure.  ITic  subsequent 
experiments  augmented  this  type  of  procedure  with  other  procedures  investigate  the  role  of  various  factors  which  may  have 
contributed  to  the  octave  illusion. 

Method 

Subjects.  Four  undergraduate  students  from  the  State  University  of  New  York  at  Binghamton  participated  in  the 
experiment  for  course  credit.  Another  eight  subjects  from  the  university  area  were  paid  for  their  participation.  All  twelve 
subjects  reported  normal  hearing.  Subjects  in  this  experiment  were  run  in  a  sound -isolated  booth.  Each  subject  was  evaluated  in 
terms  of  the  pattern  they  perceived  given  the  illusion  sequence. 

Stimuli.  400  Hz  and  800  Hz  pure  tones  were  computer  generated  using  a  12-bit  D/A  converter  with  10  kHz  sample 
rate  and  4  kHz  low-pass  filtering.  The  250  ms  stimuli  began  and  ended  at  positive  waveform  zero-crossings  with  no  ramps.  All 
the  stimuli  used  in  subsequent  experiments  were  generated  in  an  identical  fashion. 

Procedure  Four  pairs  of  400  Hz.  and  800  Hz  pure  tones  were  presented  to  subjects  through  TDH49-10Z  headphones 
at  75dH  in  the  manner  summarized  in  the  left  column  of  Figure  2.  with  the  800  Hz  tone  always  presented  first  in  the  right  ear 
For  each  trial  ISI  was  selected  randomly  from  values  ranging  lietwcen  100  and  2200  ms  in  increments  of  300  ms;  ISI  varied 
between  trials.  Note  that  the  period  of  repetition  of  the  stimuli  (period  =  2"(duration  +  ISI))  co-vancd  with  ISI. 


Insert  Figure  2  alxnil  here 


I'be  experiment  consisted  of  40  trials  (5  repetitions  at  each  ISI)  for  each  subject.  On  each  trial  subjects  listened  to 
the  stimulus  sequence,  then  indicated  the  nature  of  their  perception  by  checking  one  of  the  six  specified  patterns  summarized  in 
f  able  1.  Subjects  also  were  encouraged  to  report  all  perceptions  which  did  not  correspond  to  any  of  the  six  patterns.  After  the 
experiment,  subjects  also  were  asked  to  dcscnlxr  what  they  typically  perceived. 


Insert  Table  1  about  here 


Results  and  Discussion 

lor  all  values  of  ISI.  10  out  of  the  12  subjects  always  reported  perceiving  an  isolated  high  pitch  on  the  right  side  of 
their  head  alternating  with  a  low  pitch  on  the  left  side  (equivalent  to  the  right  column  of  Figure  2  and  pattern  1  in  lable  1) 
One  of  the  remaining  2  subjects  simply  reversed  pitch  location,  always  reporting  a  low  pitch  on  the  right  and  a  high  pitch  on  the 
lelt  (pattern  2)  (or  all  values  ol  ISI  I  he  remaining  subjec  t  failed  to  hear  the  illusion,  consistently  reporting  two  simultaneous 


» 


Octave  Illusion  3 

pitches  (pattern  6)  to  opp  ^site  ears,  again  for  all  values  of  ISI.  In  the  post-experiment  de-briefing,  most  subjects  verbally 
indicated  that  the  pitchcf  were  not  fully  lateralized  to  either  ear. 

The  finding  that  perception  of  the  illusion  is  independent  of  ISI  (for  ISI  over  the  range  of  100  to  2200  ms)  indicates 
that  the  octave  illusion  is  not  sensitive  to  the  silence  gap  between  tones.  Had  ISI  been  critical,  the  illusion  should  have 
weakened  or  disappeared  at  long  values  of  ISI. 


Experiment  2 

With  the  initial  results  from  Experiment  1,  we  attempted  to  develop  a  type  of  objective  2IFC  task  tailored  to 
investigate  the  location  and  pitch  of  the  percepts  in  the  octave  illusion.  Initially,  we  set  up  a  two  component  trial  structure,  with  I 

each  component  consisting  of  a  sequence  of  tones.  One  component  on  the  trial  was  a  sequence  of  dichotic  400  and  800  tones 
changing  from  ear  to  ear  (the  illusion  stimulus  configuration).  The  other  consisted  of  a  sequence  of  only  the  800  Hz  tone 
presented  to  the  right  ear  alternated  with  the  400  Hz  tone  presented  to  the  left  ear  (i.e.,  following  the  typical  description  of 
illusion  perception,  as  summarized  in  Figure.  1).  The  effort  to  develop  a  relevant  2IFC  task  failed,  but  did  succeed  in 
demonstrating  some  unexpected  aspects  of  the  illusion.  With  an  external  standard  as  a  reference  on  each  trial,  most  of  our 
laboratory  staff  perceived  the  actual  location  of  each  perceived  pitch  toward  the  ear  which  received  the  800  Hz  tone.1  However, 
the  perceived  pitches  also  were  quite  different  from  (and  intermediate  to)  the  monaural  800  and  400  Hz  tones.  This  pattern  of 
perception  is  quite  different  from  the  typical  conception  of  the  illusion,  but  is  consistent  with  the  pattern  described  by  Deutsch 
(1974a)  as  perception  of  "single  pitch"  group. 

The  subsequent  experiments  all  build  upon  our  informal  observations  that  the  perceived  pitches  and  locations  in  the 
illusion  definitely  do  not  correspond  to  an  800  Hz  tone  in  one  ear  and  a  400  Hz  tone  in  the  other,  but  rather  arc  relatively 
smaller  shifts,  with  perceived  pitch  being  somewhere  within  the  octave  (i.e.,  the  high  pilch  was  much  lower  than  an  800  Hz  tone 
and  the  low  pitch  was  higher  than  400  Hz).  Obviously,  the  perception  of  a  single,  intermediate  pitch  was  quite  unexpected,  and 
does  not  seem  to  l.a  ^  a  direct  counterpart  in  the  extensive  research  on  pitch  perception.  Experiment  2  was  designed  to 

quantify  our  informal  results.  Our  goal  was  to  specify  what  pitches  are  perceived  when  this  form  of  the  illusion  is  experienced  ^ 

(i.e.,  a  single  tone  with  small  alternating  pitch  and  location).  With  this  mapping,  we  can  begin  to  address  possible  explanations 
for  the  unexpected  findings. 

Method 

Subjects  Sixteen  subjects  participated  in  Experiment  2.  Ten  were  the  subjects  from  Expcrimenl  1  who  had  perceived 
the  expected  illusion  pattern.  These  10  subjects  participated  in  the  first  pitch- matching  condition  (see  the  procedure  below).  Six 
additional  subjects  were  paid  for  participation  in  the  subsequent  conditions.  All  subjects  reported  normal  hearing  and,  when 

tested  (using  the  procedure  from  Exp  1).  reported  perceiving  the  alternative  form  of  the  illusion.  I 

Stimuli.  Seventeen  pure  tones,  ranging  from  400  to  800  Hz  in  25  Hz  steps,  were  generated  as  stimuli.  These  stimuli 
also  were  used  in  the  suliscqoent  experiments. 

Procedure.  There  were  three  separate  pitch-malching  conditions  in  this  experiment.  The  first  condition  used  a  version 
of  the  Levitt  (1971)  up-down  procedure  to  roughly  match  the  pitch  of  a  single  binaural  tone  to  the  perceived  high  or  low  pitch. 

In  this  condition,  the  fixed  sequence  of  illusion  stimuli  were  presented  as  in  Experiment  1.  but  now  1S1  was  reduced  to  50  ms 
and  the  sequence  was  followed  by  a  250  ms  delay,  which  then  was  followed  by  a  single  250  ms  comparison  tone.  For  a  given 

block  of  trials,  each  subject  was  instructed  to  compare  cither  only  the  perceived  high  or  low  illusion  pilch  with  the  pitch  of  the  ) 

comparison  tone.  Subjects  indicated  whether  the  comparison  pitch  was  higher  or  lower  than  the  illusion  pitch  by  pressing  a 
corresponding  key  (up  or  down  arrow)  on  a  computer  keyboard.  The  process  then  was  repeated  for  the  alternative  illusion 
pitch.  The  order  of  matching  high  or  low  illusion  pitch  was  counterbalanced  across  subjects. 

For  each  block  of  trials,  there  were  one  up  sequence  and  one  down  sequence  of  comparison  tones  which  started 
respectively  at  frequencies  of  400  Hz  and  800  Hz.  On  each  trial,  the  computer  imposed  the  current  comparison  lone  by  random 
selection  of  one  of  the  two  sequences.  If  the  comparison  tone  from  that  sequence  was  judged  to  be  higher  (or.  alternatively, 

lower)  than  the  illusion  pitch,  the  frequency  of  the  comparison  tone  was  decreased  (or  increased)  by  25  Hz  for  the  next  ) 

selection  of  the  given  sequence  For  each  sequence,  the  frequency  of  a  comparison  tone  was  recorded  when  the  direction  of 

frequency  change  of  the  adaptive  procedure  reversed  sign.  Each  reversal  in  the  direction  of  frequency  change  (increment  or 

decrement)  is  assumed  to  indicate  a  crossing  of  the  frequency  match  (point  of  subjective  equality)  with  the  perceived  illusion 

pitch  Thus,  in  such  adaptive  procedures  most  reversals  arc  made  in  the  frequency  region  of  pitch  equality.  A  block  of  trials 

ended  only  after  fx)fh  sequences  reached  a  minimum  of  13  recorded  frequencies  at  which  the  response  (and  frequenry  direction) 

changed.  The  derived  statistical  measures  followed  standard  up-down  procedures.  In  calculating  the  mean  and  standard  error. 

the  first  3  recorded  frequencies  were  excluded,  thus  focusing  on  the  measurements  which  were  narrowly  concentrated  around  the  ^ 

perceived  illusion  pitch.  Thus,  for  a  given  pitch,  each  block  of  trials  generated  two  mean  frequency  values  (one  for  each 
sequence)  based  upon  a  minimum  of  10  measurements. 

The  up-down  procedure  has  a  built-in  criterion  for  consistency  If  responses  drive  2  sequences  together,  the  subject 
must  have  a  consistent  basis  for  responding.  Furthermore,  step  si/e  determines  precision  of  measurement.  If  all  subjects  give 
highly  similar  results,  then  subjects  have  similar  basis  for  responding  ITius.  results  of  individual  subjects  were  examined  to 
insure  that  within  each  block  of  trials  the  two  sequences  converged  quickly  on  a  specific  frequency  and  that  convergence  was 

maintained.  Across  all  subjects,  the  average  difference  lielween  the  two  sequence  means  was  S  -t  II/,  with  an  average  standard  . 

deviation  for  sequence  reversal  of  10  1 1/  Therefore,  individual  subjects  were  highU  consistent  in  responding  based  upon  a 
single.  stable  pitch  percept 

I  he  up-down  procedure  also  has  several  disadvantages  fur  mu  use  in  tin  'iiireni  slurb  Because  n|  mu  la  I*  Tutors 


I 


Octave  Illusion 


4 


structure,  we  could  only  run  one  subject  at  a  time.  Furthermore,  because  of  hardware  constraints,  the  subject  had  to  l>c  seated 
near  the  computer  and  thus  in  the  laboratory  control  room  (rather  than  in  a  sound  chamber),  shutting-down  the  rest  of  the 
laboratory.  Therefore,  for  the  remainder  of  the  study  we  utilized  a  different  procedure  with  multiple  subjects  being 
simultaneously  run  in  sound  chambers. 

The  second  pitch-matching  condition  again  evaluated  the  perceived  high  and  low  pilches  in  4-patr  illusion  sequences, 
this  time  using  the  method  of  constant  stimuli.  In  addition  to  the  pragmatic  concerns  summarized  above,  there  were  two  reasons 
to  run  this  condition.  Although  the  method  of  constant  stimuli  lacks  an  inherent  internal  standard  for  reliability,  the  shape  and 
slope  of  the  psychometric  function  of  each  individual  can  provide  basis  for  evaluating  reliability  and  precision  of  measure.  Thus, 
the  method  of  constant  stimuli  allows  an  alternative  evaluation  of  the  stability  of  perception  in  terms  of  the  slope  of 
psychometric  function  of  each  individual  subject  (computed  by  linear  regression  of  z-score  transformed  data).  Also,  we  could 
compare  results  across  methods  to  determine  if  the  obtained  pitch  matches  depend  upon  the  psychophysical  method  used. 

The  task  for  the  subjects  in  all  conditions  was  to  report  whether  the  pitch  of  the  comparison  tone  was  higher  or  lower  than 
the  designated  (high  or  low)  pitch  in  the  illusion  sequence.  Subjects  indicated  their  response  by  pressing  one  of  two 
corresponding  buttons.  In  this  condition  the  comparison  stimulus  for  each  trial  was  randomly  selected  from  the  full  octave  range 
(400-800  Hz),  not  just  a  narrow  range  of  comparison  frequencies  determined  by  previous  trials.  (Pilot  work  with  both  methods 
also  had  utilized  stimuli  outside  of  the  octave,  but  the  pitch  of  these  stimuli  was  clearly  highly  discrepant  from  any  illusion  pilch, 
and  their  inclusion  decreased  the  amount  of  relevant  data  which  could  be  collected  from  each  individual  subject.  The 
comparison  stimuli  thus  were  limited  to  the  octave  range).  A  third  pitch-matching  condition  was  run  with  method  of  constant 
stimuli  to  evaluate  the  possibility  that  the  perceived  difference  between  high  and  low  pitches  was  a  function  of  sequence  length. 
This  third  condition  used  12  pairs  of  illusion  stimuli. 

Results  and  Discussion 

The  pitch  comparison  results  from  the  first  condition  (adaptive  procedure)  are  summarized  in  Figure  3.  which  plots 
frequencies  of  tones  matched  to  the  high  versus  low  pitch  for  each  subject.  The  two  axes  of  this  scatter-plot  represent  the  full 
range  of  the  octave.  Most  of  the  data  are  concentrated  in  the  lower  half  of  the  octave,  showing  that  the  perceived  high  and 
low  pitches  were  more  toward  400  liz  than  800  Hz.  The  regression  line  has  a  slope  of  0.91  (ri  =  .78),  indicating  a  relatively 
constant  difference  between  high  and  low  pitch  matches  and  consistency  of  data  across  subjects.  The  mean  frequencies  ot  high 
and  low  pitch  matrh^;  were  550  and  501  Hz,  respectively,  resulting  in  a  mean  difference  of  49  Hz. 


Insert  Figure  3  alx>ut  here 


Under  the  method  of  constant  stimuli,  the  psychometric  functions  of  each  individual  subject  in  the  4-  and  12-pair 
conditions  exhibited  very  steep  slopes,  the  mean  frequency  change  for  the  inter-quartile  range  (middle  50%  of  a  psychometric 
function)  was  only  10  Hz.  The  sleep  slope  of  the  individual  psychometric  functions  indicates  that  the  perceived  frequency  of 
the  high  arid  low  pilches  in  the  octave  illusion  was  highly  stable  for  each  subject. 

The  scaiteT-plotled  results  for  the  method  of  constant  stimuli  are  summarized  by  the  filled  symbols  in  Figure  4  (the 
open  symbols  are  the  results  from  a  subsequent  experiment).  The  filled  circles  and  filled  squares  represent  the  data  for  4-  and 
12-pair  sequences  respectively.  The  pitch-match  data  for  both  sequences  again  are  concentrated  in  a  very  narrow  range  of 
frequencies  in  the  lower  half  of  the  octave,  revealing  that  most  subjects  perceived  highly  similar  pairs  of  pitches.  In  fact,  the 
range  of  perceived  pitch  across  subjects  is  too  narrow  to  compute  a  meaningful  regression  line  for  the  full  data  set.  With  the  4- 
pair  sequence,  respective  mean  frequencies  of  the  high  and  low  pitch  match  across  subjects  were  548  and  510  Hz.  when  computed 
from  the  psychometric  function  using  a  linear  regression  based  upon  frequency  (or  546  and  507  Hz  when  based  upon  logarithmic 
frequency),  resulting  in  a  mean  difference  of  38  Hz.  With  the  12-paiur  sequence,  the  mean  frequencies  for  high  and  low  pitches 
were  538  and  497  Hz  (or  536  and  495  Hz  when  based  upon  logarithmic  frequency),  resulting  in  a  mean  difference  of  41  Hz. 
(The  lack  of  differences  between  matches  obtained  from  regressions  based  on  linear  and  logarithmic  frequency  reflects  the  steep 
slopes  of  the  individual  psychometric  functions.)  The  difference  between  perceived  high  and  low  pitches  within  each  condition 
was  statistically  significant  in  all  three  pilch-matching  conditions  [4-pair  adaptive  procedure.  t(9)  =  5.96.  p<  01.  4-pair  constant 
stimuli  procedure,  t( 5 )  —  5.24.  p <  .01.  and  12-pair  constant  stimuli,  t(5)  =  5.20,  p <  .01 1. 


Insert  Figure  4  aN'ut  here 


(iiven  the  significant  overlap  of  the  pilch-matching  data  for  4-  and  12-pair  constant  stimuli  conditions,  it  is  not 
surprising  that  the  t-tesl  of  the  perceived  pitch  difference  between  the  4-pair  and  12-pair  sequence  was  far  from  licing 
statistically  significant  Jt(5)  =  .20.  p>.50|  Perceived  pitch,  therefore,  was  not  a  function  of  sequence  length,  at  least  for  the 
sequences  studied.  Furthermore,  the  difference  between  the  perceived  pitches  measured  using  the  adaptive  procedure  (first 
condition)  did  not  differ  from  that  measured  using  the  method  of  constant  stimuli  (second  condition)  jt(14)-.94.  p>.2j.  showing 
that  method  is  not  important  in  the  precision  of  pitch  measurement  Both  procedures  consistently  indicate  pitches  correspond  to 
narrow  range  between  approximately  500  and  550  II/,  and  definitely  a  less  than  the  400  H;  defining  the  full  octave  Ihcrcforc. 
similar  pattern  of  percepts  were  oliserved  in  either  mcth<x1  The  following  two  experiments  used  a  single  method,  the  method 
of  constant  stimuli 

We  do  not  claim  lhal  our  subjects,  who  arc  relatively  musically  naive,  arc  perceiving  a  singular  tonal  quality  It  is 
quite  possible  that  ihe  subjecis  are  hearing  a  complex  stimulus  which  has  a  distinct  pitch  quality  and  lhal  tins  pitch  clearly  docs 
not  lorrcspond  to  the  pin  h  of  eithci  o|  the  original  <*l.ive  stimuli  Instead  subjects  reported  |X‘H  rising  pitches  wlmh  wnc 


Octave  Illusion 


5 


intermediate  between  the  original  frequencies  and  which  were  skewed  toward  the  low-frequency  end  of  the  octave.  'Hie  pitch 
match  results  seem  to  suggest  that  our  perceptual  system  is  performing  some  type  of  weighted  average  of  the  dichotic  frequency 
inputs,  resulting  in  the  perception  of  a  fused  stimulus  of  intermediate  pitch.  A  quantification  of  this  averaging  process  will  be 
presented  in  the  general  discussion  along  with  a  discussion  of  the  possible  nature  of  the  pitch  percept. 

Experiment  3 

A  reviewer  on  an  earlier  version  of  this  manuscript  suggested  that  our  subjects  were  actually  hearing  stimuli  an  octave 
apart,  with  the  pitch  results  reflecting  a  lack  of  ability  of  musically  naive  subjects  to  perform  the  matching  task.  With  this  lack 
of  subject  ability,  the  two  procedures  were  conjectured  to  exhibit  a  regression  toward  an  intermediate  frequency,  rather  than 
providing  separate  measure  of  the  same  percept  This  reviewer  further  suggested  that  the  use  of  a  broader  range  of  comparison 
stimuli  probably  would  result  in  different  pitch  matches.  A  second  reviewer  conjectures  that  the  intermediate  pitch  results  might 
be  artifacts  of  having  used  linear  rather  than  log  frequency  for  the  psychometric  functions.  Therefore,  in  this  experiment  wc 
again  will  present  both  solutions.  Experiment  3  represents  a  control  to  verify  that  the  pitch  results  are  reasonable,  and  not 
artifacts. 

Method 

Subjects.  Ten  subjects  participated  in  Experiment  3.  Nine  were  undergraduate  students  who  participated  in  fulfillment 
of  course  requirements.  The  first  author  also  participated. 

Stimuli.  The  comparison  tones  were  generated  in  the  same  fashion  as  in  the  earlier  experiments.  The  frequencies  of 
the  comparison  tones  ranged  from  350  to  850  Hz  with  a  25  Hz  step  size. 

Procedure.  Two  conditions  were  run  using  the  method  of  constant  stimuli.  In  the  first  condition,  subjects  matched  the 
high  or  low  pitch  of  a  4-tone  monaural  sequence  consisting  of  alternating  400  and  800  Hz  tones,  thus  mimicking  the  most  typical 
illusory  pattern  of  pitches  reported  by  Deutsch  (1974a).  The  second  condition  instead  used  the  4-pair  illusion  sequence  of 
Experiment  2  as  stimuli. 

Results  and  Discussion 

The  pitch-match  results  from  both  conditions  are  shown  in  Figure  5.  I'he  open  symbols  represent  the  pitch  matches 
from  the  monaural  condition;  the  filled  symbols  represent  pitch  matches  from  the  illusion  sequence.  There  is  no  overlap  of 
matches  between  the  two  conditions.  I'he  dark  asterisk  represents  the  mean  pitch  matches  obtained  in  Experiment  2  for  the  the 
4-pair  illusion  using  the  method  of  constant  stimuli.  This  reference  point  is  well  within  the  range  of  results  for  the  replication 
of  this  condition. 

The  mean  pitch  match  for  the  monaural  400  and  800  Hz  tones  were  444  Hz  and  739  Hz,  respectively;  the  difference 
of  295  Hz  between  high  and  low  pitches  is  significant  [t(9)  =  13.25.  p < .01  j.  In  the  illusion  sequence  condition,  the  mean  matches 
for  the  high  and  low  pitches  were  528  and  479  Hz  respectively;  the  difference  of  49  Hz  between  pitches  is  also  significant 
[t(9)  =  3.68.  pc.Olj.  Clearly,  the  actual  pitch  matches  and  the  magnitude  of  differences  between  high  and  low  pitches  differ 
significantly  across  the  two  conditions  (295  vs.  49  Hz;  t(9)  =  9.44,  p<.01{. 

Differences  between  pitch  matches  from  regressions  based  upon  linear  and  log  frequency  psychometric  functions  did 
not  exceed  1  Hz  in  the  monaural  condition,  and  3  Hz  in  the  illusion  sequence  condition.  Again,  this  lack  of  difference  indicates 
steep  slopes  for  the  individual  psychometric  functions,  and  thus  the  consistency  of  pitch  judgments.  If  the  obtained  matches  to 
the  illusion  sequence  merely  reflected  subjects  attending  to  the  frequency  of  cither  component  in  isolation  (i.e.,  sometimes 
perceiving  a  400  Hz  pitch  and  other  times  perceiving  an  800  Hz  pitch),  as  a  reviewer  suggested,  then  steep  psychometric 
functions  would  not  have  been  obtained.  Thus,  these  matches  appear  to  reflect  the  stable  perception  of  pitch  based  upon  some 
sort  of  averaging  of  the  dichotic  components. 

As  with  all  subjects,  the  three  subjects  who  most  accurately  matched  the  400  and  800  Hz  tones  in  the  monaural 
condition  (represented  by  open  circles  in  Figure  5)  also  gave  illusion  pitch  matches  which  closely  approximated  the  mean  results 
from  Experiment  2  (the  filled  circles  in  Figure  5).  Except  for  the  one  subject  who  could  not  accurately  perform  the  monaural 
pitch-matching  task  (open  square),  most  subjects  provided  relatively  accurate  monaural  performance.  Such  relatively  accurate 
matches  indicate  that  most  subjects  were  capable  of  providing  reasonable  pitch  matching,  and  all  subjects  provided  very  different 
pitch  matches,  for  the  illusion  and  control  conditions.  Thus,  the  reported  pitches  for  these  subjects  in  the  illusion  condition 
should  reasonably  reflect  perception  for  these  subjects,  not  merely  a  procedural  artifact 

Insert  Figure  5  alxmt  here 


Experiment  4 

In  I  xpenments  1  and  2,  incidence  of  the  alternative  octave  illusion  was  not  greatly  affected  by  cither  ISI  or  sequence 
length  (4-pairs  vs  12  pairs).  We  thus  wanted  to  evaluate  the  incidence  of  the  illusion  with  shorter  stimulus  sequences 
Fxperiment  4,  therefore,  evaluated  the  perception  of  pitch  and  location  for  a  minimal  illusion  sequence  of  the  dichotic  400  and 
800  H/  tones  (ic.  2  dichotic  pairs,  and  thus  1  illusion  cycle) 

Method 

Subjects  Sixteen  subjects  participated  in  Fxperiment  4  Fleven  were  undergraduates  participating  lor  course  credit 
four  others  were  paid  for  their  participation  The  first  author  alvi  served  as  a  subrect 

Procedure  There  were  two  conditions  in  this  experiment  In  the  first  condition  only  the  two-p.nr  illusion  sequences 
were  presented  with  the  800  11/  tone  always  in  the  right  ear  for  the  hrst  pair  As  in  l  \pcrimcnt  2.  ihe  duration  loj  c,uh  pan 


Octave  Illusion 


6 


was  250  ms  and  the  interval  between  two  pairs  was  50  ms.  On  each  trial  subjects  reported  whether  one  or  two  tones  were 
perceived  for  both  the  first  and  second  dichotic  pairs  presented.  If  perceiving  one  pitch  per  dichotic  pair,  subjects  also  reported 
the  location  of  the  percept  and  which  stimulus  (1st  or  2nd)  was  higher  in  pitch,  'Hie  report  procedure  for  this  first  condition 

was  similar  to  that  used  in  Experiment  1.  Subjects  who  perceived  the  typical  variant  of  the  illusion  subsequently  participated  in 

the  second  condition,  which  consisted  of  separately  matching  the  high  and  low  illusion  pitches  using  the  method  of  constant 
stimuli. 

Results  and  Discussion 

Thirteen  subjects  perceived  the  typical  variant  of  the  illusion.  Two  of  the  remaining  3  subjects  reported  hearing  the 
identical  pattern  of  localization,  but  with  the  low  (rather  than  the  high)  pitch  first  The  remaining  subject  perceived  two 
simultaneous  pitches  for  each  dichotic  pair.  Therefore,  the  rate  of  perception  of  the  illusion  pattern  with  a  minimal  illusion 
sequence  of  two  dichodc  pairs  is  as  high  as  found  for  any  of  the  longer  illusion  sequences  studied  in  the  earlier  experiments. 
Once  again  the  data  showed  that  sequence  length  is  not  critical  to  the  incidence  of  illusion. 

The  pitch  match  results  for  the  13  subjects  who  perceived  high  pitches  toward  the  right  ear  are  shown  in  Figure  4  as 

open  squares.  This  scatter  plot  again  indicates  that  the  perceived  pitches  were  distributed  over  a  narrow  range  of  frequencies  in 

the  lower  half  of  the  octave.  The  mean  frequencies  for  perceived  low  and  high  pitches  were  508  and  535  Hz  (505  and  533 
with  log  regression  solution).  Therefore,  the  mean  difference  in  perceived  pitch  was  27  Hz  A  t-test  revealed  that  the 
difference  between  the  matches  to  the  perceived  high  and  low  pitches  was  statistically  significant  [t(12)  =  4.l9  p<  .01 Although 
there  is  more  variability  in  these  results  relative  to  the  4-  and  12-tone  sequences  (filled  symbols),  the  general  trend  in  the  results 
is  quite  similar. 

The  results  of  Experiment  4  indicate  that  the  nature  of  pitch  perception  in  the  illusion  persists  even  with  a  minimal 
length  sequence.  Although  the  absolute  differences  between  high  and  low  pitches  may  be  somewhat  smaller  than  found  for 
longer  (2-  and  6-cycle)  sequences.  The  results  still  demonstrated  that  sequence  length  is  not  critical  to  the  illusion.  However, 
we  do  not  reject  the  notion  that  the  sequence  length  may  have  effect  on  the  perceived  difference.  Although  not  significant,  the 
perceived  pitch  difference  in  4-  and  12-pair  sequence  was  greater  than  that  in  2-pair  sequence.  Conversely,  with  most  subjects  in 
Experiment  4  perceiving  illusion  percepts  in  a  single  cycle  sequence,  dichotic  fusion  would  seem  to  be  a  contributing  factor  to  the 
octave  illusion,  as  was  proposed  by  Bregman  (1990.  p  306). 


Experiment  5 

Experiment  4  suggested  that  fusion  could  be  the  critical  factor  in  the  octave  illusion.  If  so.  then  subjects  sho*ild  still 
perceive  a  single  fused  tone  even  if  only  one  pair  of  dichotic  400  and  800  Hz  tones  is  presented.  Furthermore,  if  pitch  matches 
to  fused  tone  pairs  (differing  in  the  presented  location  of  the  400  and  800  Hz  components)  are  similar  to  those  found  in  the 
earlier  experiments,  then  fusion  could  be  argued  to  provide  a  basis  for  the  perception  of  this  alternative  form  of  the  octave 
illusion.  Given  the  role  of  other  stimulus  and  listener  characteristics  in  fused  stimuli,  the  central  question  concerning  the  illusion 
then  would  become  why  the  perceived  pitch  toward  the  right  ear  is  different  (usually  higher)  than  that  toward  the  left  ear? 

'H\c  two  phases  of  the  current  experiment  thus  evaluate  the  nature  of  perception,  including  the  possible  location  and  pitch 
percepts  of  fused  stimuli  associated  with  a  single  pair  of  400  and  800  Hz  tones  The  first  phase  of  the  experiment  evaluated 
the  number  of  pitches  perceived  a  id.  If  a  single  pitch  was  perceived,  the  subjects  were  asked  to  also  report  the  location  of  the 
perceived  pitch.  The  second  phase  evaluated  the  frequency  of  the  perceived  pitch,  (assuming  the  perception  of  a  single  tone). 

There  is  one  major  conceptual  difference  between  pitch  matching  for  the  current  and  the  earlier  experiments.  In  the 
earlier  experiments  subjects  heard  a  sequence  of  at  least  two  stimuli  and  thus  could  logically  be  instructed  to  match  the  higher  or 
lower  pitch  with  the  pitch  of  the  comparison  stimulus.  With  only  a  single  stimulus,  the  relative  concept  of  "higher"  or  "lower" 
pitch  is  meaningless.  In  this  experiment  we  can  only  evaluate  the  pitch  perceived  for  a  specific  condition  of  stimulus 
presentation  (e.g..  800  Hz  presented  to  right  vs  to  left  car)  With  this  inherent  difference  in  task,  the  pitch  matching  results  will 
lie  discussed  in  the  general  discussion  section. 

Method 

Sublets.  Twenty  undergraduate  students  from  SUNY-Bmghamton  participated  as  a  course  requirement.  All  subjects 
reported  normal  hearing  and  participated  in  both  phases  of  the  experiment. 

Procedure  In  the  first  phase  of  the  experiment  following  presentation  of  a  pair  of  dichotic  400  and  800  Hz  tones, 
subjects  pressed  a  button  to  indicate  whether  they  perceived  one  or  two  tones  If  one  tone  was  perceived,  subjects  also 
indicated  the  location  of  the  tone  (left  car,  left  side,  middle,  right  side,  and  right  car).  Ilicre  were  20  randomized  trials  in  this 
phase  of  the  experiment.  For  10  trials,  the  800  11/  tone  was  presented  to  the  right  ear:  in  the  other  10  trials,  the  800  Hz  tone 
was  presented  to  the  left  car. 

'Itie  second  phase  of  the  experiment  was  the  pitch-matching  task  using  the  method  of  constant  stimuli  in  which  a  single 
pair  of  400  and  800  Hz  pure  tones  was  dichotically  presented  for  250  ms.  followed  by  a  250  ms  silence  interval,  then  a  250  ms 
comparison  tone.  Subjects  pressed  one  of  two  buttons  to  indicate  whether  the  comparison  tone  was  higher  or  lower  in  pitch 
than  the  initial  tone 
Results  and  Discussion 

I  able  2  summarizes  the  results  of  the  ftrsi  phase  of  the  experiment  When  the  800  Hz  tone  was  presented  to  the  left 
ear.  subjects  reported  hearing  one  tone  on  92 {/  of  ail  trials.  A  single  fused  tone  was  perceived  either  in  the  lelt  car  or  toward 
the  left  side  of  the  head  on  47'#  of  trials,  at  a  central  location  on  32  ' ;  >f  trials,  and  either  toward  the  oglu  side  of  the  head  <<r 
directly  in  the  right  ear  on  only  13'#  of  (rials  A  chi-square  test  confirmed  that  when  800  Hz  was  presented  to  the  lelt  ear  the 
b*ne  r.  more  likelv  to  U*  perceived  toward  tin  left  ( chi'  p  (*!  When  the  SUM  11/  tone  was  presented  to  the  tight  i  ,n 


Octave  Illusion 


7 


a  single  tone  was  perceived  on  96%  of  trials.  A  singular  pitch  was  perceived  toward  the  right  side  or  ear  on  68%  of  trials,  at  a 
central  location  on  20%  of  trials,  and  to  the  left  side  or  ear  on  only  8%  of  the  trials  (the  remaining  4%  representing  dual-pitch 
perception),  i’he  location  of  the  perceived  tone  was  significantly  more  toward  the  nghi  side  when  800  Hz  was  presented  to  the 
right  ear  (chi2  =  63,  p<.01).  Furthermore,  the  incidence  of  fusion  and  the  pattern  of  perceived  locations  for  the  fused  tone  was 
quite  consistent  across  individur '  subjects.  The  results  of  phase  one  therefore  demonstrate  that  fusion  occurs  for  a  single 
dichotic  pair  of  harmonic  stimuli,  and  that  the  fused  tone  is  more  likely  to  be  perceived  toward  the  ear  receiving  the  high 
frequency  input. 


Insert  Table  2  about  here 


However,  this  tendency  was  stronger  when  the  higher  frequency  was  presented  to  the  right  ear.  indicating  that  the  lateralization- 
by-frequency  effect  is  additionally  influenced  by  a  type  of  right  ear  dominance.  We  realize  that  this  conceptualization  only 
describes  the  findings,  and  does  not  provide  an  explanation  of  results.  However,  possible  contributing  factors  for  lateralization 
are  discussed  in  more  detail  in  the  general  discussion. 

The  pitch-match  results  for  the  20  subjects,  shown  in  the  open  circles  of  Figure  4,  are  highly  similar  to  the  results 
from  the  2-.  4-,  and  12-pair  sequences  from  our  earlier  experiments.  These  data  again  are  concentrated  in  the  lower  portion  of 
the  octave  range,  but  they  are  distributed  sufficiently  to  compute  a  reliable  regression  line  (^  =  .82).  The  slope  of  this  line  is 
.87,  which  is  close  to  that  found  with  the  initial  4-pair  sequence  (Figure  3).  The  slope  approaching  1  and  the  high  correlation 
coefficient  respectively  indicate  that  there  is  an  approximately  constant  difference  in  frequency  for  the  high  and  low  pilch 
matches.  When  the  right  ear  received  the  800  Hz  tone,  the  mean  pitch  was  524  Hz  (521  Hz  for  log  frequency);  when  the  left 
ear  received  800  Hz  tone,  the  mean  pitch  was  503  Hz  (501  Hz  for  log  frequency).  The  perceived  difference  is  21  Hz.  A  t-test 
showed  that  the  perceived  difference  between  these  means  was  statistically  significant  jl(19)=4.38,  pc.01).  Thus,  not  only  did 
fusion  occur  for  a  single  pair  of  dichotically  presented  400  and  800  Hz  tones,  but  the  pitch  perceived  when  the  800  Hz  tone  was 
presented  to  the  right  ear  was  significantly  higher  than  when  800  Hz  was  presented  to  the  left  ear.  Although  smaller  than  the 
38  to  41  Hz  difference  for  the  4-  and  12-pair  sequences,  the  difference  is  quite  similar  to  the  27  Hz  difference  for  the  2-pair 
sequence. 

These  findings  provide  a  strong  basis  (i.e.,  the  nature  of  fusion  under  different  presentation  conditions)  for  why  in  the 
octave  illusion  a  single  tone  is  perceived,  with  pitch  and  location  changing  with  the  change  of  ear  presentation.  Taken  as  a 
whole.  F.xps.  1-5  provide  support  for  the  argument  that  the  dichotic  fusion  of  octave-related  tones  is  the  critical  factor  in  the 
octave  illusion 


General  Discussion 

The  octave  illusion  has  three  distinctive  perceptual  characteristics:  (1)  perception  of  one  tone  at  a  time,  n2)  the 
perceived  fused  tone  tends  to  be  lateralized  toward  the  ear  receiving  the  higher  frequency  input,  and  (3)  the  tone  perceived  to 
each  side(right  vs  left)  has  a  different  pitch.  We  will  individually  address  each  of  these  three  characteristics  lielow. 

Perception  of  one  tone  at  a  time 

Subjects  in  Kxpenmenl  1-4  must  have  fused  the  dichotic  stimuli  to  perceive  a  single  pitch;  if  (as  found  with  a  few 
musically  trained  subjects)  two  simultaneous  pitches  had  been  consistently  perceived,  the  illusion  would  not  be  possible.  Taken  in 
conjunction  with  the  demonstration  in  Kxpenment  5  of  frequent  dichotic  fusion  for  only  a  single  pair  of  octave-related  tones,  the 
data  reveal  that  the  perception  of  one  tone  at  a  time  in  at  least  the  alternative  form  of  the  octave  illusion  can  be  attributed  to 
the  dichotic  fusion  of  the  octave-related  tones.  I'he  occurrence  of  dichotic  fusion  may  be  due  to  some  form  of  special 
relationship  between  harmonically-related,  and.  more  specifically,  octave-related  stimuli  McAdams  (1982)  has  demonstrated  that 
subjective  fusion  is  much  higher  for  harmonic  (shared  fundamental)  than  for  inharmonic  stimuli.  More  recently.  Buell  and 
Hafter  (1991)  demonstrated  an  inability  to  segregate  such  harmonically  related  stimuli  even  when  there  is  significant  lateral 
displacement  of  the  tones. 

Although  there  is  a  perceptual  affinity  for  harmonically  related  tones,  we  have  known  from  at  least  the  time  of 
Helmholtz  that  octaves  represent  the  most  consonant  of  stimulus  relationships  and  exhibit  the  highest  tendency  for  fusion  More 
recent,  empirical  findings  include  those  of  Ward  (1954).  who.  asking  listeners  to  adjust  a  lone  to  match  a  specific  pilch 
relationship  to  a  presented  tone,  found  that  listeners  most  reliably  matched  the  octave  Ward  concluded  that  ihc  subjective 
octave  should  l>c  the  most  stable  musical  pitch  relationship 

A  special,  central  nature  of  perception  of  oclavc-rclaled  tones  has  Ixcri  discussed  by  Tcrhardt  (1974)  and  has  been 
demonstrated  recently  by  Demany  and  colleagues  (I)emany  and  Semal  1988.  1990;  Demany,  Scmal,  and  C'arlyon  1991).  Dcmany. 
ct  al  (1988)  dichotically  presented  two  simultaneous,  sinusoidal  frequency -modulated  tones  to  listeners  Listeners  were  instructed 
to  detect  phase  differences  between  the  modulated  tones  Demany.  ct  al  found  that  the  just  noticeable  values  of  phase 
ditfcrcnccs  were  at  a  minimum  when  the  trines'  center  frequencies  differed  by  close  to  1200  cents  (one  txiavc)  These,  and 
similar  results  have  led  to  the  conclusion  that  octave  relationships  are  special  m  terms  of  perception  and  arc  central  in  origin 
I  atcrali/iition  of  the  fused  tone 

The  fused  (one  tended  to  Ik*  latcrali/cd  toward  the  car  receiving  the  higher  frequency  input  However,  might  such 
totalization  I**  due  to  differences  in  loudnevs  rather  thin  frequency''  I  here  is  sonic  indication  that  Unh  relative  frequency  and 
loudness  are  important  to  the  It  Nation  of  a  pitch  percept  with  earh  pining  relatively  diMcrcm  roles  in  different  stimulus  settings 
(c«.  (  ut ting  1 97r, j  Deutsch  (1978.  1981)  showed  that  the  lateralization  to  the  high  frequency  ear  even  when  the  amplitude  ol 
th'  higher  Irrqu’-niv  tone  is  snlistantially  lower  then  the  amplitude  of  »hr  low  hri|in-m  v  I  lion  and  >  nnd  (!,2’4.  '9?Sj  found 


§ 


Octave  Illusion 


8 


that  the  location  of  fused  dichotic  tones  tend  to  lx:  located  to  the  side  receiving  the  louder  stimulus  In  contrast  to  these  earlier 
studies,  our  iwo  frequencies  (400  and  800  Hz)  were  equal  in  intensity  and  thus  should  differ  in  loudness  by  no  more  than  3 
phones  (Fletcher  &  Munson,  1933).  One  can  obtain  perceptible  changes  in  lateralization  with  interaura)  differences  of  one  dB  or 
less,  but  this  is  more  typical  for  higher  frequencies  than  were  the  frequencies  used  in  the  current  study.  Therefore,  it  is  still  an 
open  question  whether  the  significant  lateralization  effects  in  the  current  study  were  the  result  of  loudness  differences.  Thus, 
although  there  is  insufficient  evidence  to  draw  a  final  conclusion  on  whether  the  location  of  a  percept  is  determined  by  relative 
frequency,  rather  than  by  relative  loudness  or  by  both,  the  preponderance  of  ex  sting  results  tend  to  favor  the  relative  frequency 
hypothesis.  Experiment  5  also  demonstrated  a  relatively  stronger  tendency  to  lateralize  the  fused  tone  toward  the  right  side 
when  the  higher  frequency  was  presented  to  the  right  car  relative  to  the  tendency  to  lateralize  to  the  left  side  when  presented 
to  the  left  ear  This  finding  suggests  that  the  right  ear  has  some  advantage  in  lateraiizing  the  fused  tone.  Further  analyses  of 
our  results  (below)  will  allow  us  to  estimate  the  relative  magnitude  of  such  ear  advantage. 

Perceived  pitch  difference 

The  pitch-matching  results  of  Experiments  2-5  are  consistent  with  the  notion  that  the  auditory  system  performs  some 
type  of  weighted  averaging  "»f  octave-related  dichotic  inputs  to  determine  the  pitch  of  the  fused  percept/  Thus,  we  can  make  no 
specific  claims  concerning  wh.  ispect  oitch  were  studying.  Furthermore,  the  pitches  we  measured  in  the  current  study 
seemed  to  be  relatively  consisted,  across  subjects.  The  frequency  of  the  fused  pitch.  Fp.  then  might  be  described  by  the  simple 
formula: 

F,=  R*F,  +  L*F,  (1) 

In  this  formula  R  and  L  are  relative  weighting  factors  for  the  stimuli  presented  to  the  right  and  left  ears:  these  weights  arc 
assumed  to  always  sum  to  unity  (i.e..  R  =  1  -  L).  F,  and  F,  respectively  are  the  frequencies  presented  to  the  right  and  left 
ears:  ir  the  current  study  F,  and  F,  were  either  800  or  400  Hz.  If  the  perceived  pitch  was  determined  solely  by  the  input  to 
the  dominant  car.  then  the  value  of  R  in  Fq.  1  must  be  1.  Clearly  it  is  not  the  case  here  for  the  form  of  the  illusion  studied. 

Our  results  demonstrate  a  very  strong,  but  not  universal,  tendency  for  subjects  to  hear  a  relatively  higher  pitch  when 
the  higher  frequency  (800  Hz)  was  presented  to  the  nght  ear  for  sequences  of  2.  4.  and  12  dichotic  pairs.  It  thus  seemed  more 
reasonable  to  initially  focus  the  formula  on  relative  frequency,  rather  than  on  ear  of  presentation.  To  accomplish  this  change  of 
focus.  Kq.  2  is  analogous  to  Kq  1.  but  with  the  modification  to  reflect  weights  for  the  higher  and  lower  frequency  tones 

Fp  -  IFF,  +  1  *F,  (2) 

Fk  and  F,  again  are  800  and  400  Hz  (now  independent  of  ear  of  presentation):  H  and  I.  are  the  respective  weights  for  800  and 
400  Hz  tones  and  again  must  sum  to  unity.  Since  Fq.  2  assumes  the  perceived  pitch  to  be  a  function  of  the  presented  location 
of  the  800  Hz  tone,  we  must  solve  Fq  2  using  a  different  pair  of  high  and  low  frequency  weights  for  each  of  these  two 
presentation  conditions  *  The  two  pairs  of  weights  should  therefore  reflect  the  perceived  higher  and  lower  pitches  when  800  11/ 
was  presented  to  either  the  nght  or  left  ear.  with  the  difference  in  weights  reflecting  the  contribution  of  ear  advantage 


Insert  Table  3  about  here 


Table  3  summarizes  the  H  and  I  weights  from  Kq.  2  as  a  function  of  sequence  length  and  pcrcei  id  pitch  Ibc 
weights  H  and  L  are  consistent  across  different  sequence  lengths:  the  means  for  H  and  L  respectively  were  U.36  and  U.t>4  when 
the  higher  pitch  was  perceived,  and  0.26  and  0.74  when  the  lower  pitch  was  perceived.  Itiese  weights  provide  quantification  that 
the  perceived  pitch  is  a  product  of  combination  of  frequencies  presented  to  two  ears 

llie  1-pair  pitch  match  results  (Fxperimcnt  5)  were  based  solely  on  presentation  condition  and  not  on  relative  pitch. 
Fxpenment  5  indicated  that  there  was  a  strong,  but  not  alisolute  tendency  for  the  location  of  a  fused  tone  to  be  correlated  with 
the  ear  receiving  the  higher  frequency  tone  as  reported  by  Dcutsch  (1978.  1981)  If  these  measurements  ol  localization  tendency 
reflect  similar  tendencies  for  underlying  pitch  process  (e  g.,  if  perceived  pitch  and  location  are  highly  correlated),  then  we  should 
be  able  to  use  the  results  summarized  in  Table  3  to  predict  the  pitch-matches  for  the  one  pair  condition.  Our  new  formula 
sutistitiitcs  the  mean  values  of  II  and  l  (from  I  able  3)  into  l  q  2  for  each  of  the  two  perceived  pitches  Ihc  portion  of  the 
formula  for  each  pitch  then  is  weighted  by  the  probability  of  localization  toward  each  ear  The  resulting  formula  is 

fr  !*,•{.. V>*|  *  4  64  *  I ,  |  +  IV  |  26*1  „  4  74*1  ,|.  (3) 

where  Pf  and  .P,  are  the  relative  probability  <>f  Uxah/ation  toward  each  ear.  as  determined  in  I  xpenment  5  Fq  .3  must  !<• 
applied  separately  for  each  of  the  two  presentation  conditions  (MM)  11/  to  right  or  left  ear)  Ihc  values  ol  lf  are  M3  and  Mu 
II/  (a  27  H/  difference)  Both  actual  mean  pitch  matches  were  almiM  lb  11/  higher  than  predicted  However,  in  this  variant  ol 
octave  illusion,  listeners  perceived  consistent  difference  in  frequency  !»etwccn  the  two  pitches,  but  vary  in  the  alisolutc  values  ol 
the  pitches  Hius,  the  difference  in  pilch  may  lx:  more  Tnportant  than  the  individual  frequencies  of  eilhci  the  perceived  high  or 
low  pitches  'Hie  observed  frequenev  difference  of  24  Hz  is  sufficiently  similar  to  the  predicted  27  11/  difference  to 
suggest  that  fusion  of  a  single  pair  of  octave -related  bines  reflects  the  weighted  combination  of  inputs  to  Unh  ears  and  ihai 
fusion  seems  to  be  the  most  critical  contributing  factor  to  the  octave  illusion 

'Hie  weights  in  I  able  3  alv>  reflect  the  relative  contribution  of  fusion  and  ear  dominance  Hie  weight  II  was  ()  3<> 
when  SfH)  Hz  was  presented  to  the  right  ear  and  b  2<>  when  Mmi  11/  was  presented  to  the  left  car  l bus.  the  input  (•<  the  nghi 
ear  always  is  weighed  more  hcavilv  hv  0  1  than  the  input  to  the  left  car  I  his  weighting  can  U  used  as  the  Uims  .  >1  a  in<>rr 
detailed  quantification  to  account  for  the  frequrn-  n  *»  of  the  |x*n  rived  pitches  Because  ihc  weights  sum  to  unitv  w-  >  an  ■  ’)*<•■ 
the  weights  to  percentages  Avrr.ijv  fx  nrivrd  pm  h  ran  U  repo  se  tiled  as  the  sum  of  26' ■>  o|  ihc  higher  frequent  \  '  t  : 
lower  frequency  and  lb'v  of  the  frequenev  presented  to  ihc  te/hi  car  lor  example  when  MK)  ||/  tone  wav  picsented  t.  b- 
right  ear  and  4bb  ||  -  (one  was  t*.  I'-ft  T b »  pit.  ‘  .  »r;  l*  predutri}  usm.:  the  equate  <u 


Octave  Illusion 


9 


F,  -  .26*800  +.64*400  4.10*800,  (4) 

yielding  a  predicted  Ff  of  544  11/.  Using  the  same  equation.  Ft  is  504  Iiz  when  400  Hz  tone  is  presented  to  the  right  ear  (with 
a  predicted  40  Hz  difference).  The  actual  means  of  the  perceived  pitches  of  543  and  504  Hz  (39  Hz  difference)  arc  consistent 
with  these  computed  frequencies  and  are  independent  of  sequence  length. 

The  estimated  weights  (11  and  L)  are  important  not  in  terms  of  their  specific  values,  but  rather  in  terms  of  the 
processing  they  reveal  Weighted  averaging  of  inputs  is  consistent  with  a  large  literature  that  suggests  a  central  mechanism  which 
is  sensitive  to  octave-related  stimuli  (e  g  ,  Demany  &  Semal  1988,  1990,  Demany.  Semal  &  Carlyon  1991).  Far  difference  in 
tonal  input  weighting  also  has  been  reported.  For  example.  Ward  (1954)  found  that  two  ears  of  a  single  observer  gave 
different  pitch  match  results  with  one  ear  consistently  giving  relatively  higher  pitch  matches  for  a  specific  tone  (e  g.,  binaural 
diplacusis). 

Fxisting  pitch  models 

llie  nature  of  fused  pitches  in  the  illusion  cannot  be  easily  explained  by  models  of  pitch  perception  of  complex  tones; 
the  focus  of  these  models  is  the  explanation  of  the  residue  and  other  pitch  percepts  in  complex  stimuli  where  pitch  does  not 
correspond  to  place  along  the  basilar  membrane.  For  reasons  to  be  discussed  below,  we  feel  that  the  pitch  percepts  studied  in 
the  current  research  represent  a  very  different  type  and  level  of  processing  than  that  addressed  by  such  models.  There  are  two 
broad  classes  of  such  modem  models  concerning  complex  pitch  perception;  Pattern  Recognition  and  Temporal  models.  Pattern 
recognition  models  assume  that  the  pitch  of  a  complex  tone  is  based  upon  neural  signals  corresponding  to  primary  sensation,  e.g.. 
the  pitches  of  the  individual  partials.  Goldstein’s  optimum  processor  theory  (1973),  Terhardt’s  (1972a,b,  1974)  pitch  perception 
theory,  and  Houtsma  and  Goldstein's  central  pitch  processing  theory  belong  to  this  group.  Goldstein’s  model  (1973)  predicts  that 
the  pitch  of  a  complex  tone  corresponds  to  the  Tjest  fit'  of  the  harmonic  senes  in  the  complex  tone  while  Tcrhart’s  (1974) 
model  suggests  that  the  perceived  pitch  of  a  complex  tone  would  always  be  a  subharmonic  of  a  dominant  partial  (resolvable 
partials).  rather  than  the  lowest  partial.  We  are  desenbing  a  single  pitch  percept  for  stimuli  which  should  be  resolvable  (in 
terms  of  critical  band  differences)  even  if  presented  to  one  ear  and  whose  partials  should  be  perfectly  coincident. 

Temporal  models  assume  that  the  pitch  of  a  complex  tone  is  based  upon  the  lime  interval  between  corresponding 
points  in  the  fine  structure  of  the  signal  close  to  adjacent  enve*  “  maxima  (Schouten,  Ritsma.  «ft  Cardozo,  1962;  Wighlman, 
1973).  Schoutcn's  theory  suggested  that  the  pitch  of  a  comple.  tone  corresponds  to  the  most  prominent  component  in  that 
sound  Wightman’s  pattern-transformation  theory  was  not  aimed  to  predict  pitch-match  data.  However,  because  one  of  our 
stimuli  is  the  harmonic  of  the  other,  there  is  no  simple,  predictable  fine  temporal  structure  among  potential  particles  which  docs 
not  correspond  to  the  temporal  properties  of  one  of  the  original  stimuli  However,  the  pitch  models  all  were  developed  to 
address  very  different  concerns  related  to  pitch  perception.  Furthermore,  the  uniqueness  of  the  octave  relationship  is  probably 
providing  the  perceptual  system  with  an  unusual,  and  most-likely  modified  sensory  information.  It  would  be  interesting  to 
evaluate  the  pitch  of  dichotic.  octave-related  stimuli  which  are  above  5  kHz  where  most  models,  and  evidence,  indicate  that  the 
coding  of  pitch  is  known  to  operate  on  a  different  basis. 

Individual  differences  in  perception 

As  reported  by  Deutsch  (1974.  1980.  1983a).  we  consistently  observed  individual  differences  in  the  perception  of  the 
reported  valiant  of  octave  illusion,  with  many  listeners  usually  perceiving  the  illusion,  but  with  some  listeners  seldom  perceiving 
the  illusion.  Furthermore,  wc  typically  found  that  extent  of  musical  experience  seemed  to  be  negatively  correlated  with 
perception  of  the  specific  variant  of  the  illusion. 

The  difference  in  perceptual  tendencies  t>ctween  musicians  and  nonmusicians  in  the  illusion  may  l>e  due  to  the  nature 
of  musical  training  Musicians  are  trained  to  listen  to  music  in  an  ‘analytic*  fashion,  with  many  trained  to  be  able  to  recognize 
t>oth  tone  location  and  what  instruments  were  playing  them.  Itus  training  may  account  for  why  musicians  often  are  aware  of 
two  simultaneous  tones  in  an  illusion  sequence  'Hie  notion  that  musical  training  produces  behavioral  differences  that  have  t»een 
well  documented  (eg.  Helmholtz.  1863.  Houtsma  and  Goldstein,  1972;  Houtsma.  1979;  Cross  and  l^ne,  1963)  Helmholtz 
(1863)  reported  that  complex  pcruxltc  sounds  can  Ik*  perceived  "synthetically*  or  "analytically"  (i.e..  perceived  as  one  sound  or  in 
terms  of  individual  partials)  Cross  and  l^ine  (196.3)  reported  that  listening  "synthetically"  and  'analytically*  can  t»c  controlled  by 
previous  training,  and  Houtsma  (1979)  has  suggested  that  musically  experienced  listeners  have  a  much  stronger  tendency  to 
perceive  complex  sounds  analytically  than  musically  naive  listeners.  Although  not  addressed  in  the  current  research,  it  also  is 
possible  that  h  an  "analytic-holistic"  distimtion  may  reflect  either  differences  in  hemispheric  dominance  (eg.  Deutsch  19S2) 
or.  alternatively,  differences  in  the  efficiency  of  encoding  component  stimuli 

In  contrast  to  our  musically  trained  subjects,  results  obtained  with  musically  naive  listeners  indicate  a  hohstu  percept  in 
which  pitch  seems  to  reflect  a  weighted  averaging  of  tx>ih  components  Clearly,  such  an  averaging  of  dichotic  information  must 
reveal  processing  that  is  occurring  at  a  more  central  level,  rather  than  at  (or  directly  derived  from  the  operat..>n  of)  the  sensory 
mechanism  We  note  ihat  musicians  also  ofien  report  perceiving  an  additional  fused  percept  shifting  in  localization.  which 
probably  is  the  output  from  a  central,  octave -related  mechanism  responsible  for  fusion  I  xisting  pitch  mixlels  arc  Uiscd  diKvtlv 
on  the  output  of  the  latter  (peripheral)  tvpr  of  prorcvsmg.  and  thus  are  not  designed  to  address  such  a  weighting  of  inlormation 
for  a  glotml  percept 

Wc  thus  i  laim  that  out  results  arc  addressing  a  ddlrrrni  type  and  level  ol  pnxcssing  than  a  century  "1  unilud 
research  <»n  the  manner  tn  which  listeners  no-mally  prnavc  pilch  In  fact,  we  txdirve  that  our  findings,  and  those  ol  Dcut.vh 
on  the  rifimi>n  variant  •->!  the  n«t.ivr  illusion  l*.th  protiaHv  relict t  an  inability  to  segregate  and  evaluate  minpuu  lit 
stimuli  rimer  thu»  a  general  dtlOtnUv  in  .m  nratclv  rnnKlmg  Ireqoemv  ai  the  sensory  level  Indeed,  m-rst  of  om  nnisnaiiv 
naive  i-.-i'd  a.  i  nr.iirj',  p-  i  simiuli  1>nt  Mill  nun  hed  illusion  piuhes  as  inter  mediate  Ix-iwrin  l"ii  and  V"  ■  II 

■  i  »mj"  an  '• 


A 


Octave  Illusion 


10 


We  thus  believe  that  we  have  been  investigating  an  attcntinnal  limit  based  on  the  analytic-holistic  distinction 
Additional  support  for  such  notion  comes  from  the  fact  that  musicians  also  reported  a  type  of  holistic  percept,  which  shifted  in  ^ 

lateralization  with  the  alternation  of  stimuli.  Although  it  may  be  difficult  for  musicians  to  map  pitch  for  this  weak  holistic 
percept  (having  to  additionally  ignore  the  strong  pitches  of  perceptually  isolated  components),  it  is  conceivable  that  the  holistic 
percept,  in  fact,  may  be  similar  in  nature  to  the  illusory  percept  commonly  reported  by  our  musically  naive  subjects.  Thus,  the 
variant  of  the  illusion  also  could  potentially  reflect  frequency  averaging  in  musicians,  and  merits  some  attempt  at  further 
research.  Currently,  however,  regardless  of  the  types  of  processing  which  are  reflected  by  individual  differences  in  the  variant  of 
the  octave  illusion,  it  is  dear  that  existing  evidence  indicates  that  musical  experience  is  negatively  correlated  with  perception  of 
this  variant  of  the  illusion. 

Summary  ^ 

The  current  study  indicates  that  the  critical  contributing  factor  to  the  octave  illusion  is  dichotic  fusion,  which  provides 
the  basis  for  the  perception  of  one  tone  at  a  time.  A  secondary  contributing  factor  is  a  right  ear  advantage  for  weighing  input: 
it  is  the  ear  advantage  which  contributes  to  the  slight  shift  in  pitch  for  this  variant  of  the  illusion. 

The  underlying  processing  of  dichotic  pairs  of  octave-related  stimuli  in  terms  of  the  perception  of  pitch  and  location  is 
not  easily  explained  by  current  models  of  pitch  perception  whose  goals  and  focus  are  not  on  such  simple  stimuli.  As  has  been 
conjectured  for  pitch,  there  docs  appear  to  be  a  central  mechanism  responsible  for  frequency  averaging  of  octave -related  dichotic 
tones  for  the  current  listeners.  However,  it  is  not  obvious  whether  such  a  mechanism  is  restricted  to  octave-related  stimuli.  ® 

Since  fusion,  the  primary  contributing  factor  to  this  illusion,  is  not  limited  to  octaves,  it  is  j/cssible  that  the  illusion  could  be 
produced  under  a  wide  variety  of  stimulus  conditions.  In  a  future  submission,  we  also  will  present  data  which  demonstrates  a 
similar  illusion  based  upon  frequency  components  that  are  not  octave-related. 


Kefcrcnces 

Akerboom.  S..  Hoopen.  G.,  &  Knoop,  A.  (1985).  Docs  the  octave  illusion  evoke  the  interaural  tempo  illusion?  # 

Perception  &  Psychophysics.  38(3),  281-285. 

Bregman.  AS.  (1990)  Auditory  Scene  Analysis.  The  MIT  press.  Cambridge.  Massachusetts. 

Cross.  D  .  and  l^ne.  H.  (1963).  Attention  to  single  stimulus  properties  in  the  identification  of  complex  tones. 

Experimental  analysis  of  the  control  of  speech  production  and  perception.  University  of  Michigan  ORA  Rep. 

No  0561 3-1 -p. 

Cutting.  J  (1976).  Auditory  and  linguistic  processes  in  speech  perception:  inference  from  six  fusions  in  dichotic 

listening.  Psychological  Review.  83(2).  114-140.  0 

Demany,  T.  and  Semal.  C.  (1988).  Dichotic  fusion  of  two  tones  one  octave  apart.  Evidence  for  internal  octave 
templates.  Journal  of  the  Acoustical  Society  of  America.  83.  687-695. 

Demany.  L.  and  Semal.  C.  (1990).  Harmonic  and  melodic  octave  templates.  Journal  of  the  Acoustical  Society  of 
America.  88.  2126-2135. 

Demany,  I...  Semal.  C,  and  C^arlyon.  R  (1991).  On  the  perceptual  limits  of  octave  harmony  and  their  origin.  Journal 
of  the  Acoustical  Society  of  America.  90.  3019-3027. 

I)  (1974a).  An  auditory  illusion.  Journal  of  the  Acoustical  Society  of  America.  55,  S18-S19.  £ 

I)  (1974b).  An  auditory  illusion.  Nature.  251 .  307-309. 

D  (1975a)  Musical  illusion.  Scientific  American.  233.  92-104. 

I)  (1975b).  TwochanncI  listening  to  musical  scales.  Journal  of  the  Acoustical  Society  of  America.  57.  1 1 50- 
1 160. 

D  (1976)  lateralization  by  frequency  in  dichotic  tonal  sequence  as  a  function  of  interaural  amplitude  and 
time  difference.  Journal  of  the  Acoustical  Society  of  America.  60.  S5(Ka) 

D  (1978a).  Binaural  integration  of  tonal  patterns.  Journal  of  the  Acoustical  Society  of  America.  64.  S146(a)  ^ 

D  (1978b)  1  atcrah/ation  by  frequency  for  repeating  sequence  of  400-11/  and  800-H/  tones.  Journal  of  the 
Acoustical  Society  of  America.  63.  184-186. 

1)  Ihc  octave  illusion  and  auditory  perceptual  integration  (1981).  In  J  V  Tobias  and  I  D.  Schulicrt  (Eds  ). 

Hearing  Research  and  Ihcory  (Volume  I)  Academic  Pkss  New  York 

I).  A  Roll.  PI.  (1976)  Separate  "what"  and  "where"  decision  mechanisms  in  processing  a  dichotic  tonal 
sequence  Journal  of  Experimental  Psychology  Human  Perception  and  Performance.  2.  23-29 
I)  (1988)  lateralization  and  sequential  relationships  in  the  octave  illusion  Journal  of  the  Acoustical  Society 

of  America.  8 3(1 1.  365-369  ® 

!Tr»m  K  .  and  Yund.  I  W  (1974)  Dicholir  compctihon  of  simultaneous  tone  bursts  of  different  frequency  I 
Dissociation  of  pitch  from  laterali/ahon  and  loudness.  Ncuropsvchologia.  IJ.  249-256 
I  fr<>n  K  and  Yund.  I  W  (1975)  Dichotic  rompelition  of  simultaneous  tone  bursts  of  different  frequency  III  Die 
effect  of  stimulus  parameters  on  suppression  and  car  dominance  function.  Ncuropsychologia,  13.  I ‘'l 
llrifh'T  H  and  Mnnson.  W  A  (1933)  loudness  its  definition,  measurement  and  calculation.  Journal  of  the 
A<  ■  >ustu  al  y^iely  oilmen*  a  5  S3  I  US 

I!  f|9'<i  An  optimum  ponessoi  theory  for  the  ccniial  formation  of  th<  pm  h  of  complex  tones 
■  •f  the  Acoumm.iI  S-Hit-iv  •  u  Ament  a  6  I  Mo 


-161 

Journal  • 


Dcutsch. 

Denlsch. 

Dcutsch. 

Deutsch. 

Deutsrh. 

Dcutsch, 

I  )CllLSf  h. 
I  )rutsrh. 
Dcutsch 
Dcutsch. 


A 


Octave  Illusion 


11 


Houtsma,  A.  J.  M.  (1979).  Musical  pitch  of  two-tone  complexes  and  predictions  by  modern  pitch  theories.  Journal  of 
the  Acoustical  Society  of  America.  66,  87-99. 

Houtsma,  A.  J.  M..  and  Goldstein,  J.  L.  (1972).  The  central  origin  of  the  pitch  of  complex  tones:  I -.vide  nee  from 
musical  interval  recognition.  Journal  of  the  Acoustical  Society  of  America.  51(2).  520-529. 

Schoutcn,  J.F.,  Ritsma.  RJ..  and  Cardozo,  B.L.  (1962).  Pitch  of  the  residue.  Journal  of  Acoustical  Society  of  America. 
34.  1418-1424. 

Terhardt,  E.  (1972a)  Zur  Tonhohenwahrenhmung  von  Klangen.  I.  Psychoakustische  Grundlagcn.  Acustica  26.  173-186. 
Terhardt.  E.  (1972b).  Zur  Tonhohenwahrenhmung  von  Klangen.  11.  Ein  Funktionsschema.  Acustica  26.  187-199. 
Terhardt.  E.  (1974).  Pitch,  consonance  and  harmony.  Journal  of  Acoustical  Society  of  America.  55.  1061-1069. 

Von  Helmholtz,  R  L.  F.  (1863).  Die  lehre  von  den  tonempfindungen  als  physiologische  Grundlagc  fur  die  theorie  der 
musik  (F.  Vieweg  &  Sohn.  Braunschweig). 

Ward,  W.D.  (1954).  Subjective  musical  pitch.  Journal  of  The  Acoustical  Society  of  America.  26,  369-380. 

Wightman,  FT.  (1973).  The  pattern-transformation  model  of  pitch.  Journal  of  The  Acoustical  Society  of  America.  54. 
407-416. 


Acknowledgments 

Based  upon  work  supported  by  National  Science  Foundation  Grant  BNS8911456  and  Grant  F4962093 10033  from  the 
Air  Force  Office  of  Scientific  Research.  Opinions,  findings,  conclusions,  and  recommendations  are  the  authors*  and  do  not 
necessarily  reflect  views  of  the  granting  agencies. 


Endnotes 


1.  Highly  practiced  musicians  tended  to  hear  two  sur  iltaneous  complex  sounds  which  were  toward,  but  not  in,  each  ear  and  whose 
pitches,  while  different  from  each  other,  seemed  to  be  intermediate  to  the  physical  stimulus. 

2.  Although  there  are  several  different  pitch-related  percepts,  our  purpose  was  to  study  the  pi»rh  changes  associated  with  perception 
of  the  illusion,  and  we  did  not  make  any  attempt  to  explain  or  explicitly  instruct  the  subjects  to  respond  to  any  specific  aspect  of 
perceived  pitch 

3.  In  a  sequence  of  stimuli  wc  cannot  perfectly  correlate  a  specific  perception  with  the  presentation  of  stimuli  to  u  specific  car,  but 
can  determine  whether  the  subject  is  responding  to  the  higher  or  lower  pitch,  and  we  do  know  that  such  pitch  perception  is  highly 
stable  with  individual  subjects.  Wc  therefore  must  apply  the  formula  based  upon  perceived  pitch,  assuming  a  difference  in  the  H 
and  I.  weights  for  the  two  percepts  Wc  arc  faced  with  the  opposite  problem  when  wc  present  a  single  pair  (half  cycle)  of  stimuli; 
here  we  know  what  was  presented  on  each  trial,  but  not  which  pitch  was  perceived— wc  only  know  the  overall  pitch  average  for  each 
presentation  condition 


I 


Time 


I 


lett  ear 

right  eart 

H 

L 

H 

L 

left  ear  right  ear 

left  ear  right  ear 

left  ear  right  ear 

L 

One  tone 

tone  1  tone  2 

H 

One  tone 

tone  2  tone  1 

L 

One  tone 

tone  1  tone  2 

r  H 

One  tone 

tone  2  tone  1 

l  :  Low  pitch  tone 
H  :  High  pitch  tone 


Table  1:  The  six  perceptual  patterns  given  to  Exp.l  subjects  defining  distinct 
responses. 


Reported  Perception 

800  Hz  to  left  ear 

800  Hz  to  right  ear 

2-tones 

8 

4 

1  -tone 

92 

96 

1-pitch  locations 

Left  ear 

17 

2 

Lett  side 

30 

6 

center 

32 

20 

Right  side 

8 

28 

Right  ear 

5 

40 

Table  2:  Number  of  perceived  tones  and  location  of  fused  tone 
given  a  single  (800/400Hz)  dichotic  stimulus  pair  (expressed  as 
percentage  of  trials). 


Sequence  Length 

Perceived  High  pitch 

Perceived  Low  pitch 

Freq. 

H 

L 

Freq. 

H 

L 

2-pair 

535 

.34 

.66 

508 

.28 

.72 

4-pair  (leuitt) 

550 

.38 

.62 

50 1 

.25 

.75 

4-pair 

548 

.37 

.63 

510 

.28 

.72 

12-pair 

538 

.35 

.65 

497 

.24 

.76 

Mean 

543 

.36 

.64 

504 

.26 

.74 

Table  3  :  Weighing  factors  (H/L)  in  different  sequence  length. 


Octave  Illusion 


15 


Figure  Captions 

Figure  1.  The  original  stimulus  configuration  and  most  typical  reported  perception  of  the  octave  illusion  from  Dcuisch  (1974a) 
Figure  2  Stimulus  configuration  and  typical  illusory  perception  for  Experiment  1  of  the  current  study. 

Figure  3.  Pitch-matching  results  in  Experiment  2  using  the  Levitt  uptown  procedure.  The  regression  line  (y=91x  +  95.3)  indicates 
the  consistent  perceived  pitch  difference. 

Figure  4.  Pitch-matching  results  for  different  sequence  lengths  in  Experiments  2.  4,  and  5  using  the  method  of  constant  stimuli 
The  filled  circles  and  filled  squares  respectively  represent  individual  pitch  matches  for  4-pair  and  12-pair  sequences  in  Experiment 
2.  The  open  squares  represent  pitch  matches  for  2-pair  sequences  in  Experiment  4.  Finally,  the  open  triangles  represent  pitch 
matches  for  1  dichotic  pair  of  tones  in  Experiment  5;  the  regression  line  is  computed  only  for  data  from  this  condition. 

Figure  5.  Individual  pitch-matching  results  for  monaural  sequences  of  alternating  800  and  400  Ilz  tones  (open  symbols),  as  well  as 
for  4-pair  illusion  sequences  (filled  symbols).  Circles  represent  data  from  the  3  subjects  who  most  accurately  matched  monaural 
stimuli.  The  square  represents  the  data  from  1  subject  who  could  not  accurately  match  monaural  stimuli.  The  triangles  ;c p.cscr.; 
data  from  the  remaining  6  subjects,  who  matched  monaural  stimuli  with  moderate  accuracy.  The  matches  to  the  illusion  sequence 
are  significantly  different  from  the  monaural  sequences,  and  are  similar  to  the  mean  results  of  Experiment  2  for  the  same  sequence 
length  (the  dark  astensk). 


i 


i 


» 


> 


Stimuli 


Left  ear  Right  ear 


400 

800 

£ 

o 

lA 

C\  _ 

800 

400 

Time 

400 

800 

800 

400 

Left  ear 


Rigtv 


H  tone 

L  tone 

H  tone 

L  tone  ] 


ear 


Figure  2 


Perceived  high  pitch  (Hz) 


Perceived  High  Pitch  (Hz) 


Method  of  Constant  Stimuli  Replication 


Figure  5 


1 


An  Auditory  Analogue  to  Feature  Integration* 

Michael  D.  Hall  and  Richard  E.  Pastore 

Psychoacoustics  Laboratory,  Department  of  Psychology 
State  University  of  New  York  at  Binghamton 
Binghamton,  NY  13902-6000 


Two  experiments  were  conducted  to  establish  feature  integration  for  audition  using  search 
tasks  analogous  to  those  typically  used  in  vision.  Arrays  of  varying  complexity  (#  of  items)  were 
constructed  using  musical  tones  distinguished  by  their  pitch,  timbre,  and  location.  Subjects 
searched  arrays  for  cued  pitches  timbres,  or  conjunctions  of  both. 

• 

Subjects  detected  with  high  confidence  the  incorrect  conjunction  of  pitch  and  timbre 
presented  only  separately  .  Also,  conjunction  search  latencies  increased,  and  accuracy 
decreased,  with  increased  array  complexity.  Following  the  logic  from  visual  research,  these 
findings  reflect  the  focusing  of  analogue  attention  to  conjoin  features. 

A  similar  pattern  of  results  was  obtained  for  searches  focusing  on  only  pitch  or  timbre.  • 

Thus,  pitch  and  timbre  may  be  feature  conjunctions.  However,  given  extensive  experience  with 
the  stimuli,  instrument  timbres  may  be  processed  more  like  distinct  features. 

Introduction  • 

Theory 

A  number  of  generalizable  models  of  attention  recently  have  come  from  vision  research. 

One  such  model  is  Feature  Integration  Theory  ,  or  FIT  (e.g..  Treisman  &  Gelade.  1980). 

According  to  FIT.  attention  is  initially  distributed,  and  the  observer  abstracts  in  parallel  all 

individual  features  and  overlearned  (automatized)  feature  conjunctions.  Analogue  attention  is  9 

then  focused  to  integrate  features  at  a  location,  and  thus  perceive  objecLs. 

Original  evidence  for  FIT  comes  from  visual  search  tasks,  where  S's  search  for  single 
features  or  feature  conjunctions  in  arrays  of  varying  complexity  .  Whereas  single-feature  search 
times  are  relatively  unaffected  by  array  complexity,  conjunction  search  times  linearly  increase 
with  increasing  array  complexity,  presumably  reflecting  focused  attention.  Also,  when  attention 
becomes  overloaded,  as  when  many  items  are  presented,  illusory  conjunctions  (wrong,  but  • 

confidently  perceived  combinations  of  presented  features)  occur  often. 

EQSSiMsJtudilory  Applications 

Although  illusory  conjunctions  have  been  conjectured  to  occur  in  auJition.  such  as  with 
the  mislocalization  of  components  of  dichotic  musical  chord  stimuli  (e.g..  Hall  &  Pastore.  1992).  » 

no  direct  test  of  FIT  has  been  made  for  audition.  The  current  study  therefore  used  musical 
stimuli  and  methods  analogous  to  visual  search  tasks  in  an  evaluation  of  h  e  applicability  of  FIT 
to  audition.  Experiment  1  examined  search  performance  as  a  function  of  array  complexity  both 
for  assumed  single  features  and  conjunctions  of  those  features.  Conditions  were  included  where 
illusory  conjunctions  were  evaluated.  Experiment  2  then  attempted  to  verify  suggestions  of 
parallel  search  for  single  features  by  examining  the  effects  of  experience  w  ith  the  stimuli.  * 


*  Poster  presented  at  the  34th  annual  meeting  of  the  Psychonomic 
Society,  Washington,  DC,  November  5, 1993 


Feature  Integration  Theory 


X)  *= 

O  * 


0)  C  V) 
>  O  Q) 

*-»  *->  t 
c  o  2 
a;  c  to 

£J  .2,  « 
<  Pu_ 
H  o 
o 


<D  4— 1 
>  TO 
£  O  V> 

C  U_  W) 

ai  -  > 

z  3  ™ 

co  -a  3 

<u  •-  < 

T3 

C 


T3 

C 

CO 

«2  ro 
a  fc 
.2,  < 
X) 

O 


(Location  Map) 


3 


General  Method 


(1)  Search  Cues 


(3)  Array  Presentation 


(4)  Question:  "Was  the  cued  target  in  array?" 

(a)  "Yes"  or  "No"  Response 

(b)  Confidence  Rating  (conjunction  trials  only) 

1  2  3  4  5  6  7 

confident  yes  unsure  confident  no 


4 


(5)  Nature  of  Search  Cues 

Cues  in  Array 


Single-Feature  Search 

Valid 


Invalid 


Conjunction  Search** 

Valid 


Invalid(+) 


Invalid(-) 


Confident  “yes"  response  =  illusory  conjunction. 
**Experiment  1  only. 


Conjunction? 


Cues: 

-"Single-feature"  (timbre  qt  pitch)  or  "feature  conjunction"  search. 

-Timbre  cues  =  4  visual  char,  for  instrument:  violin,  piano,  trombone,  clarinet,  or  flute. 
—Pitch  cues  =  sine  tones:  262,  370,  509,  762,  or  1078  Hz. 

-Conjunction  cues  =  pitch  by  natural  timbre. 

-Cues:  "Valid"  (in  array)  or  "Invalid"  (not  in  array):  p  =  0.5 


—Features  of  invalid  conjunction  cues: 
”Lnvalid(+)"  (present,  not  conjoined); 
'Tnvalid(-)"  (none  in  array); 


p  =  0.5 
p  =  0.5 


-Confident  positive  response  for  Invalid(+)  cue  =  illusory  conjunction. 


Arrays: 

-Varied  complexity  (#  of  tones)  and  tone  distance  (near  vs.  far). 

— Unique  tone  localization  produced  by: 

(1)  manipulating  interaural  time  disparities, 

(2)  inharmonic  pitches  separated  by  =  1/2  octave,  with  distinct  timbres  (see  above). 


—Significantly  decreased  accuracy  and  increased  RT,  with  increased  array  complexity  (2- 
*s.  4-tone);  not  consistent  u/lth  (feature)  parallel  search 

-Significantly  faster  responses  to  valid  cues;  typical  Of  search  results. 

-No  distance  effect  (2-tone  Near  >  s.  Far). 

I 

-Since  many  S  »  were  pianists,  piano  linihre  mas  ha»e  acted  as  a  single  (overlearned  or 
acquired  I  feature.  Accuracy  increased,  and  RT  decreased  (thus  better 
approKlmatlng  parallel  search i.  but  still  some  indication  01  ait», 
complexity  effects. 


ion  Time  (ms) 


» 


Illusory  Conjunctions 

Mean  Error  Rates  =  illusory  conjunctions  +  general  errors 


Audition:  Hall  &  Pastore  (’93) 
(minus  timbre  confusions) 

Vision:  Treisman  &  Schmidt  (’82) 

• 

Near  (2-Tone)  .18 

Far  (2-Tone)  .25 

4-Tone  .24 

.23 

.16 

• 

High  Confidence  Error  Rates  (minimizes  general  errors) 

Near  (2-Tone)  .12 

Far  (2-Tone)  .17 

4-Tone  .14 

» 

Mean  .14 

.15 

Feature  Integration  Evidence:  Displayed  for  confidence  ratings,  10  S's 
(Similar  results  for  "Yes/No"  task,  but  with  faster  RTs) 


—Significantly  less  confident*,  less  accurate*,  and  slower  ratings  with  increased  array 
complexity  (2-vs.  4-tone);  consistent  with  predicted  attention¬ 
demanding  serial  search 

—Significantly  fewer  high  confidence  responses,  more  errors,  and  slower  RT  on  Invalid(  +  ) 

cue  trials  (see  heiow) 

—Moderate  rates  of  errors  where  subjects  responded  with  high  confidence  on  lnvalid(+) 
cue  trials;  Strong  evidence  Of  illusory  conjunctions  Estimated  rates 
comparable  to  rates  in  vision 

* 

Marginal  simple  effects  for  lnvalid(+)  cue  trials. 


Experiment  2:  Experience  Effects 


Extended  Timbre  Search 


first  2  hours 


last  2  hours 


Displayed  for  2  pianists  (Similar  tendency  for  other  2  S's) 


Accuracy: 

—Significantly  decreased  overall  accuracy  with  increased  array  complexity  (marginal  for  piano 
cue  trials). 

-No  practice  effects. 

—Reduced  difference  for  piano  cue  trials  ( Ceiling  effBCt?) 

RT: 

-Significantly  increased  overall  RT  with  increased  array  complexity,  consistent  with  serial  search. 
Also,  significantly  faster  on  valid  cue  trials  ( i.B self-termindting  search) 

—Marginally  significant  reduction  of  RT  with 
practice. 

-Reduced  effects  of  array  complexity  given  more  practice,  especially  on  piano  cue  trials. 

rBUBaling  trend  toward  automaticity 


Conclusions  &  Future  Directions 

(1)  Evidence  for  serial  search  and  illusory  conjunctions 
argue  for  feature  integration  in  audition. 

(2)  Timbre  and  pitch  each  appear  to  represent  conjunctions 
of  primitive  features. 

(3)  With  extensive  experience,  a  timbre  (e.g.,  piano)  can  be 
processed  more  automatically  (i.e.,  akin  to  a  single-feature). 

(4)  The  single-feature  search  task  can  be  used  to  define 
auditory  (e.g.,  timbral)  features. 


10 


References 


Hall,  M.  D.  &  Pastore,  R.  E.  (1992).  Effects  of  base  complexity  in  musical  duplex  perception. 
Proceedings  of  the  123rd  Meeting  of  the  Acoustical  Society  of  America.  91.(4,  pt.  2), 

2339  (abstract  2SP7). 

Snodgrass,  J.  G.  &  Townsend,  J.  T.  (1980).  Comparing  parallel  and  serial  models:  Theory  and 
implementation.  Journal  of  Experimental  Psychology:  Human  Perception  and 
Performance.  6(2),  330-354. 

Treisman,  A.  (1982).  Perceptual  grouping  and  attention  in  visual  search  for  features  and  for 
objects.  Journal  of  Experimental  Psychology:  Human  Perception  and  Performance,  g, 
194-214. 


Treisman,  A.  (1992).  Perceiving  and  re-perceiving  objects.  A  meric 
862-875. 


;ist.  47(7), 


Treisman,  A.  &  Gelade,  G.  (1980).  A  feature-integration  theory  of  attention.  Cognitive 
Psychology,  12,97-136. 

Treisman,  A.  &  Schmidt,  H.  (1982).  Illusory  conjunctions  in  the  perception  of  objects.  Cognitive 
Psychology.  14,  107-141. 


Work  supported  by  AFOSR  grants 
F4962093 10033  and  F496209310327, 
and  NSF  grant  BNS8911456 


Implicit  assumptions  in  modeling  higher  level  auditory  processes 
Ricluird  I'.  Pastore 

Center  for  Cognitive  &  Psycholinguist ic  Sciences 
Binghamton  University 
Binghamton,  New  York  13902-bOCK) 

Abstract 

There  has  been  growing  interest  in  the  investigation  of  auditory  stimulus  processing  at  levels  considered  to  be  clearly  beyond 
or  above  the  limits  imposed  by  the  peripheral  auditory  system.  Efforts  to  investigate  such  higher  levels  of  processing  of  complex 
stimuli  are  nearly  always  based  upon  assumptions  about  perceptual  and  decision  processes  that  limit  the  range  of  reasonably  valid 
conclusions.  Such  assumptions  are  usually  implicit  and  often  not  immediately  recognized.  To  illustrate  the  critical  role  played  by  such 
implicit  underlying  assumptions,  existing  and  new  research  on  the  perception  of  formant  transitions  in  speech  will  be  examined  in 
terms  of  basic  assumptions  whose  recognition  can  modify  (and  sometimes  strengthen)  conclusions  about  higher  levels  of  perceptual 
processing.  Discussion  will  focus  on  the  implications  of  fundamental  assum;  ions  for  the  identification  and  demonstration  of 
important  principles  of  perceptual  organizing  (e.g.,  Gestalt,  feature  integration)  and  for  testing  hypotheses  about  alternative  perceptual 
models,  mooes,  or  modules.  [Research  supported  in  part  by  NSF  and  AFOSR.] 

Invited  Paper  piesemed  at  Acoustical  Society  of  America, 

Ottawa,  Ontario,  Canada 
May  17.  1993  (2pMU3) 


This  paper  is  a  little  different  than  the  preceding  vo  papers  in  this  session  (Bregman.  1993;  Darwin.  1993).  A1  Bregman 
provided  a  summary  of  the  nature  of  modern  auditory  organization  research  as  illustrated  by  some  of  the  excellent,  influential  research 
conducted  in  his  laboratory  and  bv  others.  Chris  Darwin  then  provided  a  summary  of  his  creative  research  demonstrating  the  role  of 
auditory  organization  principles  in  the  perception  of  complex  signals,  including  speech  and  music.  I  will  shift  focus  to  spend  some  time 
addressing  some  potential  problems  and  pitfalls  in  modem  research  on  auditory  organization,  and  then  suggest  some  possible  solutions 
to  the  problems.  I  should  note  that  in  examining  potential  problems,  one  also  can  more  fully  appreciate  the  elegance  of  the  research 
just  described  by  Bregman  and  Darwin. 

Much  of  the  recent  effort  to  evaluate  the  higher  level  processing  of  complex  signals  has  been  couched  in  terms  of  general 
perceptual  principles  derived  by  Gestalt  researchers  early  in  this  century.  This  is  a  noble  effort  and  the  five  sessions  at  this  meeting 
speak  to  the  modem  importance  of  these  efforts.  However,  there  are  some  important  limitations  to  this  approach  which  must  be  kept 
in  mind  in  attempting  to  draw  strong  conclusions. 

Although  working  primarily  with  visual  stimuli,  Gestalt  researchers  faced  several  problems  which  can  exist  in  modem 
auditory  research  efforts.  One  major  problem  was  that  the  Gestalt  research  was  based  upon  the  analysis  of  static  visual  stimuli,  and 
some  of  the  Gestalt  conclusions  did  not  always  readily  generalize  to  dynamic  stimuli.  Puis  problem  is  enhanced  in  attempts  to  apply 
Gestalt  principles  to  audition,  with  the  researcher  often  beginning  with  prediction*  (or  even  assumptions)  based  upon  an  analysis  of  a 
static  visual  representation  of  the  auditory  stimulus  displayed  in  terms  of  time,  frequency,  and  intensity.  The  visual  representation  of 
auditory  stimuli  makes  the  initial  anal}  as  potentially  even  more  removed  from  perceptual  reality  than  the  early  Gestalt  work  on  visual 
perception.  Thus,  the  predictions  using  general  perceptual  principles  may  sometimes  be  based  upon  artificial,  static  cpresen  tat  ions  of 
the  waveform  which  have  not  taken  into  account  critical,  perceptual  limitations  or  important  dynamic  perceptual  properties. 

The  second,  and  somewhat  related  problem  with  modem  research  applications,  reflects  a  basic  premise  of  the  Gestalt 
approach.  The  Gestalt  school  developed  as  the  reaction  to  previous  research  efforts  which  had  made  strong  assumptions  about  the 
nature  of  the  critical  units  of  sensation  and  perception.  One  basic  premise  of  the  Gestalt  approach  is  that  the  researcher  needs  to 
allow  perceptual  results  to  define  the  basic  units  of  perception.  This  essential  feature  of  the  Gestalt  approach  is  in  contrast  to  the 
researcher  making  assumptions  about  those  basic  units  and  then  proceeding  as  though  those  assumptions  were  valid.  However,  the 
Gestalt  approach  also  can  leave  the  researcher  open  to  an  inherent  circularity  where  the  identified  principles  are  used  on  a  post  hoc 
basis  to  define  the  basic  units  used  to  demonstrate  the  principles.  As  a  positive  counter-example,  I  note  that  much  of  the  work  by 
Bregman,  and  the  work  referenced  in  Bregman  s  excellent  book  (Bregman,  1991)  is  careful  about  letting  perception  define  the 
variables  while  avoiding  the  inherent  problem  of  circularity. 

What  I  hope  to  accomplish  today  is  first  to  demonstrate  (using  speech  research)  some  important  examples  of  how 
assumptions  about  features  or  basic  units  of  perception  can  lead  to  erroneous  conclusions.  I  then  will  focus  on  some  important  new 
procedures  which  might  be  used  to  test  notions  about  basic  units  of  perception  and  thus  avoid  the  problem  of  circularity  . 

In  1971  Mattingly  and  his  colleagues  published  an  important  paper  wh;ch  has  often  been  cited  as  a  basis  for  co  ‘rast 
between  the  perception  of  speech  (e.g.,  phonemes  in  CV  context)  and  analogous  nonspeech  stimuli.  I  will  spend  a  few  minutes 
summarizing  this  study  and  its  findings  because,  with  the  significant  advantage  of  hindsight,  that  paper  provides  a  basis  to 
understanding  what  may,  or  may  not,  be  perceptually  analogous  conditions,  and  allows  us  to  begin  to  conic  to  grips  with  adequate 
definitions  of  perceptual  features  or  units. 

The  top  portion  of  FIGURE  1  provides  a  summary  of  the  stimuli  used  in  the  Mattingly  paper.  The  stimuli  were  two 
formant  synthetic  C’V  syllables  which  varied  in  the  onset  frequency,  and  thus  the  nature  of  the  second  formant  transition.  Subjects 
weir  asked  to  label  the  stimuli  and  to  perform  an  oddity  discrimination  task  between  pail's  of  stimuli  differing  in  nominally  equal  step- 
si /e.  The  findings  were  fairly  typical  of  the  categorical  perception  literature.  The  labeling  function  (displayed  as  in  the  original  paper 


Pasture-  Acoustical  Society,  2pMU3.  May  1993  -  5/16/93 


2 


for  one  subject)  exhibits  relatively  discrete  labeling  boundaries  between  the  three  categories  of  "b,"  "d,"  and  ‘g.’  The  discrimination 
function  ^as  in  the  original  study,  pooled  across  subjects)  exhibits  peaks  which  roughly  coirespond  to  the  location  of  the  category 
boundaries. 

I  he  only  part  of  the  physical  stimuli  which  varied  was  the  second  formant  transition.  Thus,  analyzing  the  visual 
representation  of  the  physical  stimuli,  it  seemed  obvious  that  any  perceptual,  nonspeech  basis  for  the  phonetic  contrast  must  be  earned 
by  the  transitions;  all  other  parts  of  the  stimuli  were  held  constant.  Subjects  asked  to  label  and  to  discriminate  the  isolated  transitions 
failed  to  exhibit  categorical  perception;  these  findings  were  interpreted  as  supporting  notion  that  the  identical  stimulus  component 
is  perceived  differently  by  speech  and  non-speech  systems.  Mattingly,  et  ah  have  made  an  implicit  assumption  that  it  is  the  isolated 
glide  whuh  is  the  most  likely  basic  perceptual  unit  for  the  phonetic  continuum.  This  assumption  is  intuitively  appealing,  but  clearly 
ignores  the  basic  premise  of  the  Gestalt  approach. 

Notice  that  the  Mattingly  subjects  might  be  perceiving  a  relative,  rather  than  absolute,  change  in  frequency  across  time. 

That  is.  petception  may  require  the  incorporation  of  the  subsequent  steady-state  portion  of  the  stimulus  represented  by  the  vowel 
formant.  An  ASA  paper  by  Ralston  and  Sawusch  (1983),  and  a  more  recent  publication  out  of  my  laboratory.  (Pastore,  Li,  &  Layer, 
1990)  demonstrated  that  subjects  experienced  with  sinewave  stimuli  yield  relatively  continue  us  perception  for  isolated  transition,  but 
categorical  perception  results  when  a  following  steady-state  component  is  added.  The  three  types  of  stimuli  and  the  results  for  the 
short  bleat  stimuli  are  summarized  in  FIGURE  2.  Notice  that  the  patterns  labeling  3nd  discrimination  results  closely  parallel  the 
syllable  results  reported  by  Mattingly.  Therefore,  it  would  appear  that  me  critical  perceptual  component  of  stimuli  varying  in  place  of 
articulation  may  be  a  change  in  frequency  leading  into  the  relatively  steady-state  component.  The  notion  of  relative  movement  or 
change  in  formant  frequency  as  being  an  important  cue  for  place  cf  articulation  is  something  that  Ken  Stevens  has  been  talking  about 
for  a  number  of  years. 

A  number  of  early  studies  on  glides  or  transitions  built  upon  the  procedures  utilized  in  the  pioneering  study  by  Brady, 
Stevens,  &  House  ( 1961).  evaluating  the  pitch  corresponding  to  isolated  FM  glides.  The  studies  usually  matching  the  pitch  of  a  tone 
to  the  pitch  of  a  previously  presented  FM  glide.  Typical  results  are  that  perceived  pitch  tends  to  be  earned  more  by  the  offset 
frequency,  with  there  being  some  differences  between  rising  and  falling  glides.  Notice  that  these  studies  parallel  the  original  Mattingly, 
et  al.  assumption  about  the  importance  of  isolated  transitions  (or  chirps)  as  critical  features,  but  have  the  additional  assumption  that 
the  major  acoustic  role  of  a  glide  is  in  terms  of  an  overall,  sm:c  pitch  quality. 

A  paper  later  in  this  sess  on  bv  Michael  Hall  (3p:  .U5)  will  present  results  of  research  with  sinewave  analogs.  Pilot 
conditions  (which  Mike  will  not  present)  indicate  that  if  one  reverses  the  ordering  of  stimuli  in  a  pitch  matching  study,  matching  a  tu.ie 
to  a  subsequent,  rather  than  previotis,  glide,  subjeas  will  tend  to  match  pitch  based  more  on  the  onset,  rather  tKan  offset  frequencies 
in  the  glides.  In  other  words,  subjects  seem  to  match  pitch  on  the  basis  of  the  most  temporaHy  c  mtiguous  pot  lions  of  the  stimulus. 
Such  a  conclusion  also  implies  that,  when  listening  to  glides,  subjects  are  perceiving  more  than  simply  a  single  overall  pitch  quality,  and 
may  not  even  hear  an  overall  pitch  quality.  The  former  conclusion  is  generally  consistent  with  often  ignored  aspects  of  the  early 
findings  of  Brady,  House,  and  Stevens  (1961).  A  later  paper  by  Nabelek  and  Hirsh  (1969)  indicates  that  not  do  subjeas  tend  to 
perceive  continuous  changes  in  frequency,  but  they  even  impose  perception  of  a  glide  when  given  a  short  time  period  between  two 
steadv-state  stimuli  differing  somewhat  in  frequency.  Therefore,  perceptual  results  seem  to  argue  that  an  important  pe  cptual 
characteristic  of  glides  is  something  more  than,  or  different  than,  over»ll  pitcii  quality  . 

Looking  at  the  physical  stimuli  frc>m  Mattingly  (FIGURE  I),  notice  that  we  appear  to  have  good  continuation  between  the 
glide  and  the  following  steady-state.  Again,  we  need  to  be  careful  in  not  drawing  conclusions  about  basic  units  of  perception  usi'  ® 
onl,  the  static  visual  representation  of  the  auditory  stimulus.  I  don't  want  to  steal  too  much  from  Mik**  Mall  s  later  paper,  but  one  of 
the  questions  is  whether  or  not  the  presence  of  physical  good  continuation  in  the  visual  representation  of  the  physical  stimuli  m 
equivalent  to  perceptual  good  continuation. 

As  shown  in  FIGURE  3.  one  can  somewhat  offset  the  ending  freauenev  of  the  tnnsi’.on  relative  to  the  steady-state  portion 
without  disrupting  the  tendency  to  perceive  the  transition  as  approp.  iate  for  the  phoneme  normally  defined  in  terms  of  contin  jity  with 
the  following  portion  of  the  stimulus.  In  Fig.  3  the  rising  (falling)  transitions  are  perceived  as  equivalent  even  though  the  center 
frequenev  of  nor  transition  is  equal  to  the  starting  frequency  of  the  other.  These  results  get  back  to  the  notion  that  an  important 
feature  for  some  consonants  may  be  related  to  a  perception  of  motion  (or  change)  in  frequency,  with  direction  being  important. 
However,  we  nerd  in  add  a  qualification,  since  Schouten  (1986)  found  that  the  ability  to  perceive  the  direction  of  transitions  may  be 
quite-  limited,  and  that  perceived  direction  need  not  correspond  to  physical  direction  of  frequency  change  (reinforcing  the  point  about 
letting  perception  define  cues). 

The  notion  of  good  continuation  may  still  apply,  but  only  when  defined  in  perceptual,  rather  than  physical,  terms. 

Systematic  research  h\  Schouten  ( i985.  |9S6;  Schouten  &  Pols,  1984),  as  weii  as  a  number  of  leccnr  papers  on  perceptual  limits  for 
transitions  may  liegin  to  provide  some  insights  into  when  some  aspects  of  transitions  might  begin  to  change  in  perceptual  nature 
(Elliott  et  a'  ,  1991;  Donlrv  A  Moore,  1988;  von  Wicringen  &  Pols,  1991).  We  suspect  that,  as  Chris  Darwin  has  demonstrated  for 
similarity  based  upon  fundament?1  frequency  for  vowel  formant. ,  a  sufficient  discrepancy  in  continuation  can  become  perceptually 
salie.  .  th  is  leading  to  segregation,  rather  than  integration  of  perceptual  elements.  In  fact,  an  carhci  study  bv  Repp  and  Bent  in 
(19841  although  intrrpietcd  somewhat  differently,  demonstrated  that,  with  sufficient  frequency  offset,  transitions  do  begin  to 
perceptually  segregate  from  the  remainder  of  the  synthetic  CV  syllables. 

We  now  ii.fTi  to  the  larger  issue  of  defining  critical  aspects  of  percept  ion --with  the  admonition  that  units  of  perception  may 
be  a  function  of  level  "f  perceptual  anahsis.  One  problem  is  a  relative  absence  of  tesrarvh  tools  for  determining  possible  basic  units 
of  perception,  or  bas  e  perceptual  features  of  complex  auditory  stimuli.  This  issue  is  the  focus  of  t*  e  reminder  o|  the  talk. 


Pastore-  Acousrira*  Society,  2pMU3,  May  1993  -  5/16/93 


3 


In  speech  perception  there  is  a  phenomenon  called  normalization  which  has  been  explored  by  Pisoni  and  his  colleagues. 
Normalization  is  really  a  modem  analog  of  the  classic  notion  of  perceptual  constancy.  In  speech  it  has  been  noted  tliat  perception  of  a 
CV  syllable  or  a  word  persists  despite  changes  in  talker,  speaking  rate,  and  a  number  of  other  properties  of  the  source  event. 

Likewise,  in  music,  one  can  perceive  the  equivalence  of  chords,  or  sequences  of  chords,  played  by  different  instruments.  In  the  speech 
literature,  normalization  is  argued  to  be  an  active  process  which  takes  time  to  implement.  Irrelevant  changes  in  speaker  or  instrument 
could  be  represented  as  added  variability  or  noise,  and  thus  lower  signal -to-noise  ratio,  which  should  slow  processing,  but  also  reduce 
accuracy.  In  normalization,  the  perceptual  system  is  assumed  to  be  able  to  evai  jate,  and  possibly  even  anticipate  the  nature  of  the 
irrelevant  variability.  The  system  then  factors  out  the  irrelevant  variability,  thus  restoring  accuracy,  or  imoosing  constancy,  on 
perception.  Notice  that  an  anticipatory  normalization  process  could  potentially  involve  imagery.  In  fact,  Crowder's  excellent  work  on 
musical  imagery  is  based  upon  the  perceptual  system  retrieving  some  son  of  an  internal  representation  of  an  irrelevant  stimulus 
properties  which  is  to  be  factored  out. 

In  order  to  effectively  invesfigate  normalization,  the  researcher  really  needs  to  begin  with  a  reasonable  conceptualization  of 
the  critical  perceptual  elements  or  features  which  are  constant  and  those  elements  which  are  factored  >ut.  Alternatively,  one  might  use 
normalization  as  a  tool  to  begin  to  evaluate  conjectures  about  essential  perceptual  features  of  auditory  stimuli.  This  Friday,  Jennifer 
Cho  (5aMU5)  will  present  a  paper  which  looks  at  he  relative  roles  of  the  attack  aj  d  the  upper  partial  in  timbre  normalization  for 
natural  and  edited  music  stimuli.  As  pan  of  this  study,  Jennifer  evaluated  the  relationship  between  perceived  similarity  between 
stimuli  defining  instrument  timfc  e  and  the  normal  i/at  ion  of  instrument  differences  for  the  perception  of  cho  FiGURE  4  shows  the 
reaction  time  measures  for  noi  ma'<zation  was  an  inverse  function  of  perceived  similarity.  Thus  similarity  scat  nig  and  normalization 
represent  two  very'  different  procedures  which  can  provide  converging  evidence  in  beginning  to  identify  important  perceptual  features 
of  music  and  speech. 

Finally,  1  wish  to  turn  to  a  model  from  cognitive  psychology  which  was  developed  for  visual  stimuli,  but  which  has  potential 
implic  .tions  for  our  understanding  of  auditory  petreptual  processing,  and  which  is  definitely  related  to  the  issue  of  the  definition  of 
featutes.  In  the  early  1980s.  Ann  Treisman  proposed  her  Feature  Integration  Theory  (FIT)  of  perception.  FIGURE  5  summarizes  this 
theory.  In  the  basic  task  a  number  of  different  stimuli  are  presented  simultaneou  V  to  subjects.  Stimuli  exhibit  different  comoinations 
of  values  along  several  dimensions  such  as  size,  shape,  color,  and  location.  According  to  FIT  the  values  of  feature"  are  preattentively 
extracted  independently  and  in  parallel,  with  rojgh  tags  (in  a  master  map)  for  location.  Therefore,  search  for  single  features  among  a 
stimulus  array  occurs  in  parallel  and  it. os  should  be  extremely  rapid.  For  example,  a  yes-no  task  for  the  presence  of  a  red  or  orange 
stimulus  should  occur  very  quickly  and  should  be  very  little  affected  by  the  number  of  stimul  in  the  array  (as  long  as  the  subjects  are 
normal  Trichromats  and  there  is  more  than  a  few  stimuli).  Attention  then  is  required  for  the  serial  task  of  conjoining  the  individual 
features  to  result  in  the  perception  of  objects  at  different  locations.  However,  the  search  for  a  conjunction  of  features,  such  as  a  red 
O.  should  be  slower,  especial1  v  as  the  number  of  stimuli  is  increased.  Furthermore,  when  processing  capacity  is  taxed,  .ch  as  with  a 
large  number  of  stimuli,  one  should  ofren  find  the  perceptual  system  erroneously  conjoining  features  presented  at  separate  locations  in 
the  array.  An  illusory  conjunction  is  the  perception  of  an  object  whose  features  appear  in  the  stimulus  array,  but  never  together. 

Mike  Hall,  Wenyi  Huang,  Barbara  Acker,  and  I  have  just  completed  a  study  (which  we  will  report  at  a  future  meeting)  using 
musical  stimuli  varying  in  instrument,  pitch  played,  and  the  lateral rzed  position  in  the  head.  Our  results  have  the  patterns  of  reaction 
time  described  by  Treisman  for  conjoined  feature  searches.  We  also  f<  rad  illusory  conjunctions  of  features  at  rates  and  levels  of 
subject  confidence  which  are  equivalent  to  those  reported  by  Treisman  for  visual  stimuli. 

Notice  the  FIT  relies  upon  the  appropriate  definition  of  features,  and  that  features  are  processed  in  a  manner  different  than 
percept  jal  elements  which  are  really  conjoined  features.  In  fact,  some  aspects  of  our  results  are  best  interpreted  as  our  using  complex, 
conjoined  features,  rather  than  b;*sic  perceptual  features,  for  some  of  the  "single-fean-ire"  pitch  and  instrument  searches.  Although  it  is 
interesting,  and  not  really  surprising,  that  one  should  find  similar  highci  level  processes  for  bo’h  visual  and  auditory  (specifically 
music)  stimuli,  tht  echniques  used  to  demonstrate  FIT  represent  another  approach  to  beginning  to  identity  peiceptu.  I  features. 

I  now  return  to  my  initial  point  that  Gestalt  researchers  warned  that  proper  conclusions  about  principles  of  perception  need 
to  begin  with  the  identification  of  perrr  ptual  units  defined  by  the  perceptual  system,  and  not  simply  by  the  researcher  s  intuition. 

''■  hen  such  units  of  perception  have  bee  \  defined  properly,  we  will  best  understand  those  aspects  of  Gestalt  principles  which  describe 
the  perception  of  speech  and  music  stimuli.  The  important  point  is  that  an  adequate  identification  of  the  basic  features  of  pereeption 
is  critical  for  the  development  and  ev^iaiion  of  a!]  models  of  complex  auditory  perception.  In  this  paper  we  have  looked  at  recent 
research  development  .  i  .i  provide  a  number  of  new  techniques  which  can  be  used  to  evaluate  potential  definitions  of  features  and 
which  can  provide  converging  evidence  in  support  of  that  type  of  evaluation. 

Acknr  lodgement 

Research  described  in  this  paper  was  supported  in  part  by  grants  F4%209310033  and  F4962093 10327  from  Ait  Force  Office 
of  Scientific  Research  and  grant  BNSH91 1456  from  the  National  Science  Foundation.  THe  opinions,  results,  and  conclusions  are  those 
of  the  author  and  do  not  neressariJv  represent  those  of  either  granting  jjgenrv. 


References 


Duiwm.  (  J..  A  (r.iulnet.  R.B.  IVrerptua!  separation  of  speech  horn  concurrent  sounds.  In  Schouten.  Mill  (Id)  l  he 

Psu  h*  t  ?  wi<  s  «d  Speech  Pcitepuon.  Nijhnff:  Boston. 


» 


Pastore-  Acoustical  Society,  2pMU3,  May  1993  -  5/16/93 


Dooley.  GJ.,  &  Moore,  B.CJ.  (1988).  "Detection  of  linear  frequency  glides  as  a  function  of  frequency  and  duration." 
Journal  of  the  Acoustical  Society  of  America,84,  2045-2057. 


Elliott.  L.L.,  Hammer,  MA.,  &  Carell,  T.D.  (1991)  "Discrimination  of  Second-formant-like  frequency  transitions," 

Perception  &  Psychophysics,  50,  1-6. 

Mattingly.  I.G.,  Liberman,  A.M.,  Syrdal,  A.K.,  &  Hahves,  T.  (1971).  Discrimination  in  speech  and  nonspeech  modes.  Cognitive 
Psychology.  2,  131-157. 

Nabelek,  V.,  Nabelek,  A.K.,  &  Hirsh,  1J.  (1970).  Pitch  of  tone  bursts  of  changing  frequency.  Journal  of  the  Acoustical  Society  of 
America.  48,  536-553. 

Pastore.  R.E.  (1981).  Possible  psychoacoustic  factors  in  speech  perception.  In  P.D.  Eimas  &  J.L.  Miller  (Eds.),  Perspectives  in  the 
Study  of  Speech.  Hillsdale,  NJ:  Ertbaum.  Chap  5. 

Pastore,  R.E.,  Li,  X.-F.,  &  Layer.  J.K.  (1990).  Categorization  of  chirps  and  bleats;  their  similarity  to  speech.  Perception  & 
Psschophvsies.  48,  151-156. 

Porter,  RJ.,  Cullen,  J.K..,  Collins,  MJ.,  &  Jackson,  D.F.  (1991)  "Discrimination  of  formant  transition  onset  frequency: 

Psychoacoustic  cues  at  short,  moderate,  and  long  durations,"  Journal  of  the  Acoustical  Society  of  America,  90, 

1298-1308. 

Repp,  B.H..  &  Bentin,  S.  (1984).  Parameters  of  spectral/temporal  fusion  in  speech  perception.  Perception  &  Psychophysics.  36,  523- 
530. 


Schouten,  M.  (1985).  Identification  and  discrimination  of  sweep  tones. 


sics.  37.  369-376. 


Schouten,  M.  (1986).  Three-way  identification  of  sweep  tones.  Perception  and  Psychophysics,  40,  359-361. 

Schouten,  M.,  &  Pols,  L.  (1984).  Identification  and  intervocalic  plosive  consonants:  The  importance  of  plosive  bursts  vs.  vocalic 

transitions.  In  M.  van  den  Broeck  &  A.  Cohen  (Eds.).  Proceedings  of  the  10th  International  Congress  of  Phonetic  Sciences, 
Foris  Publications,  Dordrecht,  4^4-468. 


Treisman.  A.M.,  and  Gelade,  G.  (1980).  A  feature-integration  theory  of  attention. 


.12,  97-136. 


Wertheimer,  M.  ( 1958).  Principles  of  perceptual  organisation.  In  D.C.  Beardslee  &  M.  Wertheimer  (Eds.).  Readings  in  Perception, 
115-135.  New  York:  van  Nostrand. 


Wienngen.  A.  van,  A  Pols,  L.C.W.  (1991).  Transition  rate  as  a  cue  in  the  perception  of  one-formant  speech-like  synthetic  stimuli. 
Proceedings  of  the  XI  1th  International  Congress  of  Phonetic  Sciences.  Aix-en-Provence,  Vol.  3,  446-449. 


Percent  Frequency  (Hz) 


I 


Mattingly  et.  al.  (1971) 


F2  Onset  Freq. 


I 


Percent  Frequency  (Hz) 


l 


Pastore  et  al.  (1990) 


- 

2000- 

k 

- 

k 

- 

k 

1500- 

- 

- 

1000- 

■ 

- 

- 

500- 

n_ 

("l  jrt }  r 

- 

Short 

Bleat 

- 

Bloat 

0  40  0  40  80  0  40  80  120  160 

Time  (ms.) 


1  100  1300  1  500  1700  1900  2100  2300 

F2  Onset  Frequency  (Hz) 


l  Figure  2 

i 

I 


Reaction  Time  (ms.) 


2 


Figure  4 


0  50  100  150 

Time  (ms.) 


oo 


Music  Normalization: 

RT  versus  Similarity  Rating 


Similarity  Rating 


L 

I 


» 


i 

7 


i 


Feature  Integration  Theory 

Single  Feature  Search: 

-Red  (valid)  -  Orange  (invalid) 

■  X  (valid)  ■  Y  (invalid) 


Conjunction  Search: 

0  =  Red  “0” 

Y  =  Orange  “Y” 
J  =  Green  “J“ 


(valid) 

(invalid) 
(illusory  conj.) 


Figure  5 


» 


► 


» 


» 


» 


» 


» 


> 


f 

i 


> 


Measuring  tile  DL  for  Identification  of  Order  of  Onset  for  Complex  Auditory  Stimuli 
Richard  Pastore,  Shannon  Farrington,  and  Sajni  Jassal 


Abstract 

Discussion  ot  the  approximately  20  ms.  threshold  for  the  identification  of  the  order  of  onset  of  components  of  auditory 
stimuli  has  ranged  from  consideration  of  the  absolute  threshold  (RL)  as  a  possible  factor  contributing  to  the  perception  of  voicing 
contrasts  in  speech  to  claims  that  the  threshold  is  a  methodological  artifact.  The  current  research  investigates  the  identification  of 
the  temporal  order  of  onset  in  terms  of  the  Difference  Limen  (DL)  tor  complex  stimuli  (modeled  after  C V  sytlables)  which  vary  in 
degree  of  onset.  The  results  provide  clear  evidence  that  the  DL  at  relatively  short  onset  differences  (less  than  25  ms)  follows 
predictions  based  upon  a  perceptual  threshold  or  limit.  Furthermore,  the  DL  seems  to  be  a  function  of  context  coding  of  stimulus 
information,  with  both  the  DL  and  RL  probably  reflecting  limits  on  the  effective  perception  and  coding  of  the  short-term  stimulus 
spectrum 

End  of  Abstract 


Running  Head:  DL  for  Order  Onset  Identification 


Somewhat  over  three  decades  ago  Hirsh  reported  new  findings  on  some  temporal  limits  of  perception  which  has  run  a 
full  circle  from  being  ignored  to  being  widely  cited,  then  often  misunderstood,  and  finally  again  largely  ignored.  Hirsh  (1959)  found 
that  there  are  a  series  or  hierarchy  of  temporal  limits  on  perception.  For  monaural  (or  diotic)  presentation  conditions,  a  difference 
of  approximately  2  msec  was  required  for  detecting  an  onset  ansychrony  of  two  auditory  stimuli  (defining  a  threshold  for 
simultaneity),  but  a  difference  of  approximately  20  msec  in  onset  was  required  to  identify  which  of  the  stimuli  had  an  earlier  onset 
(defining  a  threshold  for  order  of  onset).  This  later  Temporal  Order  Threshold  (TOT)  can  be  contrasted  with  a  threshold  of  several 
hundred  msec  for  correctly  assigning  the  order  labels  to  a  sequence  of  four  or  more  stimuli  repeated  in  sequence  (Warren,  1982). 
Moving  in  the  other  temporal  direction,  a  difference  of  only  a  few  microseconds  is  required  to  detect  a  difference  in  onset  for  the 
identical  stimuli  presented  to  the  two  ears,  with  this  threshold  really  reflecting  a  difference  in  lateralization  (Hirsh,  1974).  Each  of 
these  thresholds  has  been  replicated  a  number  of  different  times  by  different  researchers  using  a  variety  of  psychophysical 
procedures.  Hirsh's  major  point  was  that  there  are  a  number  of  different  types  of  temporal  limitations,  with  the  two  shorter 
thresholds  probably  reflecting  sensory  limitations,  and  the  longer  limits  reflecting  perceptual  or  even  memory  limitations  on  the 
processing  of  stimuli.  These  observations  are  important  in  their  own  right. 

The  focus  of  the  current  research  is  approximately  a  20  msec  threshold  for  TOT  which,  for  a  number  of  years,  had  been 
of  extra  interest  because  ot  its  potential  relationship  to  the  perception  of  initial  position  stop  consonants  of  English  which  are 
contrasted  in  voicing.  Hirsh  (1959)  had  observed  that  such  stop  consonants  typically  exhibit  a  labeling  or  categorization  boundary 
when  voicing  onset  is  delayed  by  approximately  20  msec  relative  to  the  release  or  onset  of  the  syllable.  Hirsh  (1959)  conjectured 
that  the  TOT  limitation  may  be  an  important  underlying  basis  for  voicing  contrast.  This  conjecture  makes  a  great  deal  of  sense;  the 
threshold  implies  that  although  subjects  may  be  able  to  detect  a  difference  in  onset  for  various  components  of  the  complex  signal 
(e  g.,  speech),  an  onset  difference  of  at  least  20  msec  (in  ideal  laboratory  situations)  is  required  to  reliably  identity  which  component 
had  an  earlier  (or  later)  onset  Thus,  an  initial  consonant  is  voiceless  if  the  higher  frequency  or  asperated  components  are  perceived 
as  having  an  onset  before  either  the  lower  frequency  (FI)  or  the  voiced  components.  Later  research  demonstrated  categorical 
perception  along  a  temporal  onset  continuum  for  noise  buzz  stimuli  (Miller,  Wier,  Pastore,  Kelly,  and  Dooling,  1976)  and  for  pairs 
of  tones  (Pisoni,  1977).  The  pattern  of  categorical  perception  for  the  stimuli  was  fairly  similar  to  that  reported  earlier  for  synthetic 
speech  stimuli  varying  in  Voice  Onset  Time  (VOT).  Thus,  the  20  msec  TOT  was  of  interest  both  as  one  of  a  limited  set  of  temporal 
limits  on  perception  and  because  of  its  potential  role  in  voicing  contrast,  the  latter  defining  the  TOT  hypothesis 

Criticisms  of  the  TOT  Hypothesis 

The  first  major  criticism  of  the  TOT  hypothesis  as  the  auditory  basis  for  the  perception  of  voicing  contrast  was  based  upon 
a  systematic  study  by  Summerfield  (1982).  Hirsh  (1959)  had  found  lhat  TOT  was  independent  of  stimulus  type.  A  number  of  later 
studies  had  demonstrated  that  the  VOT  boundary  for  American  English  syllables  was  a  function  of  place  of  articulation,  varying  from 
24-28  msec  for  labial  stops  to  28-52  msec  (or  more)  for  velar  stops  (for  review,  see  Pastore,  1987a).  Summerfield  built  upon  these 
observations,  demonstrating  that  TOT  for  tones  and  noise-buzz  stimuli  was  relatively  constant,  and  thus  independent,  of  many  of 
the  frequency  parameters  which  have  been  demonstrated  to  be  important  for  changes  in  place  of  articulation.  Furthermore,  the  T OT 
boundary  was  at  a  much  shorter  duration  than  reasonable 

If  one  adopts  an  extreme  view  about  the  TOT  hypothesis,  positing  that  voicing  contrast  is  based  solely  upon  whether  or 
not  the  listener  can  perceive  the  order  of  onset  of  specific  components,  then  the  Summerfield  study  represents  clear  evidence 
against  this  hypothesis  However,  a  weaker  version  of  the  TOT  hypothesis  is  still  reasonable.  According  to  the  weaker  version  of 
the  hypothesis,  perception  of  an  order  of  onset  is  an  important  contributing  factor  or  cue  for  the  perception  of  voiceless  quality,  but 
it  is  not  the  only  cue  This  multiple  cue  notion  has  a  reasonable  basis  in  the  general  findings  that  all  speech  categories  are  based 
upon  the  contribution  of  a  number  of  different  features  or  cues,  with  the  perception  of  voicing  contrast  reflected  in  the  action  of  up 
to  16  different  cues  (Lisker,  et  al„  1977)  If  voicing  contrast,  or  any  other  phonetic  contrast,  were  based  upon  any  single  factor  or 
threshold,  then  the  decades  of  research  by  outstanding  scientists  would  have  long  ago  succeeded  in  identifying  the  singular  factor 
determining  a  phonetic  contrast  Furthermore,  there  are  logical  grounds  for  conjecturing  that  temporal  order  limitation  is  a 
contributing  factor  to  the  perception  of  voicing  contrast  Specifically,  production  with  English  stop  consonant  studies  have  found 
distributions  of  voiced  stimuli  which  are  skewed  toward  simultaneity,  with  very  few  tokens  having  VOT  values  even  approaching  the 
approximately  20  msec  TOT.  and  with  the  distribution  of  voiceless  tokens  skewed  toward  higher  VOT  values,  again  avoiding  the 


DL  for  Order  Onset  Identification 


2 


ft 


ambiguous  region  near  the  TOT  threshold  limitation.  One  critical  question,  then,  concerns  what  specific  components  of  the  stimuli 
might  be  perceived  in  an  ordered  fashion. 

Afthough  it  is  not  necessary  for  voicing  contrast  to  be  based  upon  the  ordered  perception  of  the  same  limited  set  of 
stimulus  components  for  all  categories  of  place  of  articulation,  a  logical  single  starting  point  for  a  common  temporal  contrast  would 
be  the  FI  cutback  in  which  there  is  no  activation  of  the  low-frequency  first  formant  resonance  until  after  the  onset  of  voicing,  with 
the  higher  formants  being  activated  within  a  very  brief  period  following  the  release  of  the  consonant.  An  alternative  candidate  might 
be  whether  or  not  the  release  burst  is  perceived  as  occurring  clearly  before  the  onset  of  voicing.  The  current  research  will  focus  on 
the  first  of  these  two  conjectures. 

Following  the  work  of  Summerfield  (1982),  several  studies  demonstrated  that  the  TOT  can  be  a  function  of  stimulus 
parameters,  especially  when  those  stimulus  parameters  are  consistent  with  stimulus  properties  exhibited  in  natural  and  synthetic 
speech  stimuli.  Hillenbrand  (1984)  demonstrated  some  variation  in  TOT  for  stimuli  with  dynamic  changes  in  frequency  at  onset, 
thus  mimicking  with  tonal  stimuli  the  frequency  changes  associated  with  FI,  F2,  and  F3  transitions.  Pastore,  et  al.  (1981;  1988) 
demonstrated  significant  changes  in  the  TOT  for  stimuli  with  dynamic  frequency  transitions  at  onset  coupled  with  variation  in  rise 
time  and  the  presence  of  an  initial  release  burst.  For  stimuli  with  slow  rise  times  (typical  of  speech  stimuli)  and  strong  initial  release 
bursts,  TOT,  measured  using  a  two  alternative  force  choice  procedure,  was  found  to  be  as  long  as  40  msec  for  certain  types  of 
stimuli,  especially  those  patterned  after  velar  stimuli  contrasted  in  voicing.  These  results  would  all  seem  to  support  the  weak  version 
of  the  TOT  hypothesis. 

Recent  Criticisms  of  TOT 

In  the  late  1980s  there  were  two  new  criticisms  of  the  TOT  hypothesis,  both  of  which  argued  against  the  existence  of  an 
approximate  20  msec  threshold  for  temporal  order  identification.  The  argument  by  Rosen  and  Howell  (1987)  is  a  theoretical  one 
based  upon  methodological  issues.  The  argument  by  Kewley-Port,  Watson,  and  Foyle  (1988)  is  based  upon  empirical  work  under 
minimal  uncertainty  conditions.  In  addition,  the  excellent  text  on  hearing  by  Moore  (1989)  simply  does  not  recognize  a  limitation 
on  temporal  order,  instead  citing  only  the  2  msec  boundary  for  simultaneity  and  the  200  msec  boundary  for  labeling  the  individual 
components  in  a  sequence  of  stimuli.  These  arguments  about  the  absence  of  a  20  msec  temporal  order  threshold  ignore  the 
replication  of  the  threshold  using  a  number  of  different  psychophysical  procedures.  However,  the  heart  of  each  of  the  major 
criticisms  will  be  addressed  before  summarizing  the  current  research  effort. 


Hirsh  Methodology 

Rosen  and  Howell  (1987)  argue  that  the  Hirsh  (1959)  procedure  is  really  a  labeling  task,  and  that  when  the  original  Hirsh 
results  are  replotted  in  terms  of  labeling,  one  obtains  a  50%  category  boundary  at  approximately  onset  synchrony  (0  msec  onset 
difference),  with  no  indication  of  any  threshold  or  limitation  at  approximately  20  msec. 

Hirsh  used  standard  psychophysical  methodology  from  the  time  to  measure  a  difference  threshold  (Difference  Limen  or 
DL).  Understanding  the  assumptions  underlying  the  measurement  of  a  DL,  and  the  limitations  with  which  Hirsh  (1959)  had  to  deal, 
one  sees  that  the  Rosen  and  Howell  argument  is  based  upon  the  implicit  and  testable  assumption  that  the  relevant  DL  does  not 
exist.  Relevant  notions  about  measuring  the  DL  can  be  found  in  the  important  text  from  the  time  of  Hirsh’s  research  (eg.,  Osgood, 
T953;  Woodworth  and  Schlosberg,  1954)  and  are  summarized  in  Figure  1  (panel  A).  According  to  classic  notions,  there  is  a  region 
around  the  Point  of  Objective  (or  Physical)  Equality  (POE)  within  which  subjects  perceive  an  equivalence  to  the  standard  or  cannot 
reliably  identify  a  difference  from  the  standard.  This  region  of  perceptual  equality  is  called  the  interval  of  uncertainty,  its  midpoint 
is  called  the  Point  of  Subjective  Equality  (PSE),  and  the  DL  is  half  of  the  size  of  the  interval  of  uncertainty.  In  the  case  of  the  research 
by  Hirsh  (1959),  there  really  are  two  regions  of  uncertainty  surrounding  temporal  onset  synchrony.  There  is  a  very  narrow  region 
for  the  perception  of  synchrony  (this  interval  of  uncertainty  equals  4  msec)  and  a  broader  region  within  which  subjects  cannot  reliably 
identify  the  order  of  onset  (this  interval  of  uncertainty  equals  40  msec).  Beyond  this  region  in  each  direction  subjects  can  reliably 
discriminate  stimulus  differences.  As  with  any  threshold,  there  is  a  statistical  distribution  of  response  probabilities,  rather  than  a 
quantum  or  discrete  change  in  the  probability  of  response. 

One.  method  for  measuring  the  DL  is  to  ask  subjects  to  adjust  the  stimulus  magnitude  either  from  a  state  of  clear 
perceptual  difference  to  an  initial  state  of  equality,  or  by  beginning  with  the  stimulus  of  physical  equality  and  ask  the  subjects  to 
adjust  the  stimuli  until  they  can  first  perceive  the  designated  perceptual  difference  (e.g.,  order  of  stimulus  onset).  Either  method 
of  adjustment  would  define  the  two  limits  of  the  interval  of  uncertainty  which  define  the  boundaries  between  three  response  regions 
(perceptually  lower,  perceptually  equal,  and  perceptually  higher).  Unlike  thresholds  defined  in  terms  of  intensity,  a  TOT  requires 
a  discrete  trial  situation  where  it  is  difficult  to  manipulate  onset  difference,  especially  using  the  technology  available  in  the  1950s 
Hirsh  thus  was  forced  to  use  the  method  of  constant  stimuli.  For  methods  of  constant  stimuli  the  three  response  regions  can  be 
mapped  using  either  a  3-category  or  a  2-category  method  (Osgood,  1953). 


Insert  Figure  1  about  here 


Fig  1A  shows  typical  results  for  measuring  the  DL  using  a  3-category  version  of  the  method  of  constant  stimuli.  The 
abscissa  .epreocnts  an  arbitrary  designation  of  the  stimulus  continuum,  with  zero  being  the  stimulus  standard  (POE).  In  the  case 
of  a  temporal  order  continuum,  zero  would  represent  onset  synchrony,  with  the  continuum  representing  (in  totally  arbitrary  units) 
the  relative  onset  of  the  two  components  to  the  stimulus  (e  g.,  lower  frequency  having  an  earlier,  versus  later,  onset).  The  three 
curves,  each  representing  an  assumed  underlying  category,  are  plotting  the  relative  probability  for  responding  to  category  A  or  B 


I 


ft 


ft 


» 


» 


> 


> 


> 


DL  for  Order  Onset  Identification 


3 


(discriminate  differences)  and  the  uncertain  (or  equality)  category.  In  this  hypothetical  illustration  the  three  response  probabilities 
are  actually  based  upon  normal  (uncertain  category)  and  cumulative  Normal  or  Gamma  (categories  A  and  B)  distributions,  with  the 
relative  proabilities  scaled  to  sum  to  unity  for  each  stimulus.  Using  classic  psychophysical  methodology,  the  interval  of  uncertainty 
can  be  estimated  in  terms  of  the  stimulus  values  yielding  50%  identification  for  categories  A  and  B,  with  the  DL  being  half  of  this 
interval.  Since  the  illustration  in  Figure  la  has  plotted  an  ideal  interval  of  uncertainty  which  is  symmetric  around  the  POE  (i.e.,  PSE 
=  POE),  the  value  of  the  stimulus  at  the  upper  limit  represents  the  DL,  and  is  so  indicated  in  Fig.  IA. 

However,  in  measuring  TOT  the  separate  threshold  for  simultaneity  represents  a  major  problem  in  defining  allowable 
responses  for  subjects.  On  a  conceptual  basis,  as  one  increases  the  onset  difference  (in  either  direction)  from  onset  synchrony, 
perception  changes  from  (1)  synchrony  to  (2)  asynchrony  coupled  with  an  inability  to  identify  order  of  onset,  and  finally  to  (3)  a  clear 
ordered  onset.  The  simplest  and  most  understandable  instructions  to  the  subjects  is  to  indicate  the  order  of  onset,  measuring 
threshold  based  upon  correctness  of  response.  This  approach,  which  was  used  by  Hirsh  (1959),  follows  the  standard  2-category 
method  for  measuring  the  DL  (Osgood,  1953).  In  the  case  of  Hirsh's  Temporal  Order  study,  the  targeted  middle  category  is  not  one 
of  simultaneity  (4  ms  wide),  but  rather  the  broader  (40  ms)  range  for  inability  to  correctly  perceive  order  of  onset;  the  middle 
response  category  cannot  indicate  equality  of  onset,  but  rather  a  failure  to  accurately  identify  order  of  onset.  Thus,  the  3-category 
procedure  therefore  could  not  be  effectively  implemented. 


Insert  Figure  1  about  here 


The  logical,  alternative  procedure  (e  g  ,  Osgood,  1953)  is  to  use  only  two-response  categories  (the  Tow"  and  "high" 
categories  from  Fig.  la).  The  resulting  psychometric  functions,  derived  from  the  three  distributions  found  in  Fig.  la,  are  plotted  in 
the  lower  panel  (b)  of  Fig  1).  The  Tow"  responses  should  occur  either  when  the  subject  perceives  a  "low"  event  (reflected  in  the 
p  (low)  curve  in  Fig.  la)  or  when  the  subject  jointly  fails  to  perceive  either  category  (reflected  by  the  uncertain  or  neither  category 
in  Fig  la)  and  guesses  Tow."  Assuming  an  equal  probability  of  guessing  sampling  from  the  uncertain  or  neither  category,  the 
probability  of  responding  "low"  for  any  given  stimulus  therefore  can  be  derived  from  Fig.  la  and  should  be  equal  to  the  p  (low)  plus 
one-half  of  p  (neither).  The  resulting  2-category  psychometric  functions  have  been  plotted  in  Fig  1b.  Notice  that  the  two 
psychometric  functions  no  longer  resemble  a  cummulative  Normal  distribution,  but  rather  have  distinctive  deviations  (e  g.,  are 
shallower,  rather  than  steep,  in  slope)  for  stimuli  falling  within  (and  near)  the  interval  of  uncertainty.  The  50%  point  for  these  2- 
category  psychometric  functions  represents  the  point  of  subjective  equality  (PSE)  which,  (by  definition)  should  at  least  approximate 
the  POE,  and  thus  onset  synchrony 

Because  the  2-category  procedure  forces  guessing,  the  points  which  had  been  represented  by  the  50%  performance  values 
in  Fig.  lAnow  must  be  estimated  from  the  75%  points  in  Fig.  IB  (e  g.,  50%  correct  responding  plus  50%  guessing  for  the  remaining 
trials)  The  75%  criterion  for  the  2-category  procedure  yields  the  same  DL  estimate  as  the  50%  criterion  with  the  3-category 
procedure  (as  illustrated  in  Fig.  1).  Rosen  and  Howell  (1967)  have  made  an  implicit  assumption  that  the  middle  category  (interval 
of  uncertainty)  does  not  exist  and  have  taken  as  supporting  evidence  the  correspondence  between  the  PSE  and  POE. 

The  functions  in  Fig.  IB  also  provide  us  with  an  important  insight  into  the  category  structure  being  used  by  subjects.  The 
deviation  from  a  reasonable  cummulative  Normal  distribution  found  in  Fig.  1b  can  be  interpreted  as  a  clear  indication  that  the 
subjects  have  an  additional,  intervening  category  which  they  are  distributing  between  the  two  allowable  response  categories  Hirsh 
(1959)  used  the  2-category  procedure  and  his  results  (scaled  in  terms  of  Z-scores)  to  exhibit  this  critical  characteristic  (a  linear 
function  with  a  bend  or  elbow).  The  Hirsch  findings  thus  indicate  the  existence  of  the  implied  intervening  category.  [It  also  should 
also  be  noted  that  it  is  fairly  typical  to  find  such  deviations  from  a  reasonable  cummulative  Normal  distribution  in  the  labeling  results 
for  a  number  of  different  speech  continua. 

Sensory  or  Perceptual  Limit 

Kewley-Port.  Wafson,  and  Foyle  (1988)  have  argued  from  a  different  perspective  that  there  is  no  temporal  order  threshold, 
at  least  not  at  the  limits  of  sensory  capabilities.  Their  study  reports  on  a  lack  of  evidence  for  any  temporal  order  limit  of 
approximately.  15  to  20  msec  under  minimal  uncertainty  conditions  using  highly  practiced  subjects.  The  Kewley-Port,  et  al.  study 
is  a  solid  research  project  on  perception  under  minimal  uncertainty  conditions  Although  we  have  questioned  whether  there  is  truly 
a  lack  of  evidence  in  their  results  for  a  threshold  at  approximately  15  to  20  msec  (Pastore,  1988;  see  also  Watson  &  Kewley-Port, 
1988).  the  important  issue  for  the  current  study  is  that  Hirsh  (1959),  and  other  subsequent  research  on  temporal  order  perception 
(e  g.,  Miller,  et  al.,  1976),  have  clearly  argued  that  the  temporal  order  limitation  is  perceptual,  and  does  not  represent  a  sensory 
limitation  (we  will  return  later  to  the  issue  of  the  nature  of  such  a  perceptual  limitation).  Therefore,  the  issue  of  whether  or  not  one 
finds  evidence  for  a  temporal  order  identification  threshold  of  approximately  15  to  20  msec  at  the  limits  of  sensory  capabilities, 
although  interesting  in  its  own  right,  is  really  irrelevant  to  perceptual  research  focusing  on  this  threshold,  and  its  possible  role  in  the 
perception  of  voicing  contrasts  Furthermore,  when  a  finding  has  been  often  replicated  under  a  wide  variety  of  more  natural 
conditions,  a  single  failure  to  replicate  under  extreme  conditions  cannot  falsify  the  more  typical  findings, 

Kewley-Port.  et  al.  also  addressed  temporal  order  perception  (under  minimal  uncertainty  conditions)  in  terms  of  the  size 
of  the  Weber  fraction;  this  represents  an  important  approach  to  studying  TOT,  If  there  is  no  TOT  then  the  function  relating  t  to  t 
should  reflect  a  Weber  function,  and  thus  shall  be  linear  with  a  positive  slope  corresponding  to  the  Weber  constant  From  a 
separate,  theoretical  basis.  Pastore  (1987b)  had  also  addressed  the  nature  of  TOT  as  an  absolute  perceptual  threshold,  incorporating 
the  notion  of  a  Weber  fraction.  This  analysis  conjectures  a  two  stage  function  relating  t  to  (.  For  relatively  small  (below  threshold) 
onset  time  differences,  the  size  of  a  just  noticeable  difference  in  temporal  order  of  onset,  t,  should  equal  the  difference  between 


DL  for  Order  Onset  Identification 


4 


the  given  stimulus.  and  the  absolute  threshold  for  order  discrimination,  ^  (e.g.,  any  subliminal  stimulus  should  be  just 
discriminable  from  the  stimulus  which  first  exceeds  threshold).  This  relationship  is  described  by  the  formula 

t  =  (\,i)  +  c  <fcM  <V  0) 

There  really  are  two  different  values  of  One  value  of  ^  represents  the  threshold  for  simultaneity  and  is 
approximately  2  msec.  The  other  value  is  the  temporal  order  identification  threshold  which,  for  relatively  stationary  stimuli  such 
as  those  used  by  Hirsh  (1959),  Pisoni  (1977),  and  Miller,  et  al.  (1976)  would  be  approximately  20  msec.  However,  with  dynamic 
stimuli  similar  to  those  used  by  Pastore,  et  al.  (1988),  ^  would  be  at  a  larger  temporal  order  difference,  possibly  up  to  45  msec. 
In  this  equation  the  constant,  c,  represents  noise  or  variability  in  the  system.  This  formula  based  upon  quantal  notions  of 
absolute  threshold,  predicts  that  the  size  of  the  just  noticeable  change  in  temporal  onset  should  decrease  linearly  (slope  of  -1.0) 
for  temporal  onsets  up  to  the  threshold  for  temporal  order  identification.  Because  we  are  dealing  with  a  psychophysical 
procedure  based  upon  probability  distributions,  we  ran  a  simulation  of  observer  behavior  in  a  2IFC  task  measuring  t.  The 
simulation  assumed  that  judgement  is  based  upon  whether  or  not  threshold  (^)  is  exceeded  (as  in  Eq.  1).  In  this  simulation 
underlying  Gaussian  distribution  of  noise  was  assumed,  with  the  stimuli  spaced  at  equal  Z-score  distance.  The  additional 
constraint  on  the  simulation  was  that  subjects  would  guess  (p  =  0  5)  when  stimuli  were  both  either  below  or  above  threshold 
(there  is  the  added  implicit  assumption  that  two  compared  supra-liminal  stimuli  differs  by  less  than  the  DL  for  supra-liminal 
stimuli].  Psychometric  functions  were  generated,  then  used  to  estimate  the  75%  estimate  of  t  for  each  value  of  \  up  to  a  value 
of  ^ .  The  simulation  indicated  that  the  measurement  procedure  should  result  in  a  linear  function,  but  with  a  slope  of 
approximately  -0.5  and  with  t  =  0  at  a  value  of  \  which  exceeds  by  a  small  amount  theoretical  value  of 

This  derivation  provides  one  contrast  in  predictions:  whether  the  function  relating  t  to  tj  (for  smaller  values  of  \)  has 
a  slope  of  -0.5  or  less  (threshold  model)  or  has  a  positive  slope  (Weber  function).  If  we  assume  the  validity  of  Weber's  law  for 
temporal  differences  above  this  threshold,  then  the  remainder  of  the  differential  sensitivity  function  should  follow  the  equation: 

t  =k(^)*c  (for  ^  >t0)  (2) 

In  Eq.  2.  k  is  the  Weber  constant,  with  all  of  the  other  parameters  being  equivalent  to  those  in  Eq.  t.  Both  equations  for  the 
threshold  notion  predict  that  the  difference  threshold  should  reach  a  minimum  value  of  c  when  the  standard,  \  ,  approximates  ^ . 
Eq.  1  predicts  a  linear  decrease  in  the  size  of  the  just  noticeable  difference  up  to  with  Eq.  2  predicting  a  linear  increase  in  the 
size  of  the  DL  for  values  of  \  >  Thus,  the  second  major  difference  in  predictions  is  that  the  threshold  model  requires  two 
linear  functions  with  an  intersection  representing  a  minimum  value  of  t  near  threshold  (^).  whereas  the  no  threshold  mode! 
based  upon  a  Weber  fraction  predicts  a  single  linear  function  with  a  positive  slope. 

There  are  a  number  of  different  issues  to  be  addressed  in  the  current  research.  The  first  issue  is  whether 
discrimination  results  for  the  perception  of  order  of  onset  follow  the  pattern  predicted  by  the  functioning  of  an  absolute 
threshold.  The  second  issue  relates  to  the  degree  to  which  context  coding  plays  a  role  in  the  identification  of  order  of  onset. 

This  issue  builds  upon  notions  which  were  originally  developed  by  Durlach  and  Braida  (1969)  and  have  been  more  recently 
applied  to  speech  in  the  work  of  Macmillan,  Braida,  &  Goldberg  (1987;  Uchanski,  Millier.  Reed,  and  Braida,  1992).  This  second 
issue  is  of  interest  not  only  in  its  own  right,  but  also  because  recent  work  by  Schouten  and  van  Hessen  (1992)  reported  results 
for  the  discrimination  of  phonemes  contrasted  in  place  of  articulation  which  seem  to  indicate  a  lack  of  trace  coding  (thus,  use  of 
only  context  coding),  with  a  lack  of  differences  predicted  on  the  basis  of  the  psychophysical  task  employed. 

General  Methods 

Subjects 

A  total  of  eight  subjects  were  hired  for  this  task.  AJI  had  normal  hearing  and  were  native  speakers  of  English.  Alt  were 
student  age  and  considered  participation  to  be  a  part-time  summer  position.  Subjects  were  run  in  commercial  sound  chambers 
and  listened  to-the  binaural  stimuli  over  TDH-49  earphones. 

Stimuli 

Two  sets  of  stimuli  were  synthesized  (10  kHz  sample  rate,  1 2-bit  converter).  The  stimuli  were  sinewaves,  with  both 
frequency  and  amplitude  changing  as  a  function  of  time  in  a  manner  which  was  consistent  with  that  described  for  the  CV 
syllables  used  by  Volitas  and  Miller  (1992).  All  of  the  stimuli  had  two  major  components  which  corresponded  to  the  FI  and  F2 
resonances.  The  FI  component  began  at  180  Hz  and  rose  linearly  to  a  steady  state  of  330  Hz  over  the  first  20  ms  of  the  tonal 
portion  of  the  stimulus.  The  frequency  of  this  low  frequency  (FI)  component  remained  at  330  Hz  for  180  ms,  then  declined  to 
300  Hz  over  the  last  100  ms.  The  frequency  of  the  higher  frequency  (F2)  component  followed  a  similar  temporal  pattern  of 
frequency  change.  H  either  rose  from  an  initial  frequency  of  1800  Hz  to  a  steady-state  frequency  of  2200  Hz,  or  fell  from  an 
initial  frequency  of  2400  Hz  to  the  same  steady-state  frequency.  Therefore,  the  two  types  of  stimuli  (Rising  and  Falling  F2 
stimuli)  differed  in  terms  of  the  onset  frequency  of  the  higher  frequency  component  A  given  subject  only  listened  to  the  two 
rising  or  the  falling  type  of  stimuli  (Rising  or  Falling  F2),  and  ran  under  both  fixed  and  roving  discrimination  conditions. 

Therefore,  we  used  a  within  subject  design  for  stimulus  type. 

The  amplitude  of  the  original  Fl  stimulus  and  the  F2  stimuli  rose  as  a  linear  function  of  duration  over  the  first  45  ms 
After  maintaining  a  steady-state  amplitude  for  155  ms.  the  amplitude  fell  to  zero  (ground)  over  the  last  100  ms  A  basic  onset 


t 


DL  for  Order  Onset  Identification 


5 


time  continuum  was  created  by  replacing  the  first  n  ms  (n  =  5,  10,  ....  120)  of  the  FI  stimulus  with  silence,  then  having  the 
amplitude  grow  linearly  over  the  next  4  ms  to  the  amplitude  of  the  original  stimulus  at  that  point  in  time.  The  onset-time 
continue  varied  in  steps  of  5  ms,  with  0  ms  (onset  synchrony)  defined  in  terms  of  the  original  FI  and  F2  stimuli.  The  stimuli 
then  were  modified  to  produce  nonspeech  stimuli  which  better  resembled  CV  sullabies.  An  initial  5  ms  noise  burst  plus  10  ms 
of  silence  then  was  added  to  the  beginning  of  each  F2  stimulus.  The  noise  burst  was  a  segment  of  white  noise  band-pass 
filtered  at  two-thirds  of  an  octave  centered  at  either  1800  Hz  (Rising  F2)  or  3000  Hz  (Falling  F2),  based  upon  an  attempt  to 
provide  an  analog  to  a  stop  consonant  release  burst.  The  FI  component  had  a  15  ms  segment  of  silence  (ground)  added  to  its 
beginning.  The  given  FI  and  F2  components  were  produced  simultaneously  by  separate  DAC  channel  (sharing  a  common  time 
base),  then  mixed  as  analog  signals  to  create  stimuli  which  differed  in  the  onset  of  the  (delayed)  FI  component  relative  to  the 
F2  component.  A  nominal  onset  difference  of  n  ms  thus  consisted  of  a  5  ms  burst  of  noise  10  ms  of  silence,  n  ms  of  'he  F2 
component  only,  and  then  the  FI  and  F2  combined  for  (300-n)  ms.  tf  timing  were  specified  in  terms  of  VOT  as  measured 
between  the  release  of  the  syllable  and  the  onset  of  voicing  (marked  by  the  end  of  the  Fl-cutback),  an  n  ms  offset  difference  for 
the  current  stimuli  would  correspond  to  a  VOT  of  (n  +  15)  ms. 

Procedure 

Both  experiments  used  a  2IFC  task  For  any  given  trial  two  stimuli  were  presented,  each  with  a  different  onset  time 
(e  g.,  5  ms  vs.  45  ms).  The  computer  randomly  determined  which  of  the  two  intervals  contained  the  stimulus  with  the  longer 
onset  difference,  as  well  as  randomly  selecting  the  specific  longer  stimulus  from  a  pre-determined  set.  The  task  of  the  subject 
was  to  indicate,  by  button-press,  which  of  the  two  intervals  contained  the  stimulus  with  the  longer  onset  difference.  In  the 
minimum  uncertainty  condition,  the  stimulus  with  the  shorter  onset  time  was  constant  throughout  the  block  of  trials,  thus 
allowing  generation  of  a  psychometric  function  for  only  that  base  stimulus  onset  time.  In  the  roving  condition  the  stimulus  with 
the  shorter  onset  time  differed  from  trial-to-trial,  with  the  other  stimulus  always  being  longer  in  onset  time. 

All  subjects  were  trained  initially  with  the  extreme  values  (5  and  120  ms)  of  the  given  stimulus  type  assigned  to  them. 
On  each  trial  one  of  the  stimulus  had  an  onset  difference  of  5  ms  and  the  other  had  an  onset  difference  of  120  ms,  with 
stimulus  order  randomly  determined.  The  subjects  were  given  feedback  They  ran  in  short  blocks  of  trials  until  they  could 
perform  the  task  with  a  high  degree  of  accuracy  (at  least  90  percent).  The  individual  subjects  then  were  run  with  the  5  ms 
standard  in  a  fixed  condition  with  comparison  stimuli  differing  in  large  steps  across  the  onset  continuum  The  goal  of  this  next 
conditio",  which  also  included  feedback,  was  to  evaluate  the  approximate  location  for  the  steep  portion  of  the  psychometric 
function  fe.g  ,  determining  the  approximate  size  of  the  difference  limen). 

Experiment  1:  ISI 

The  first  experiment  evaluated  the  effects  of  varying  ISI  in  the  discrimination  of  temporal  order  onset.  If  the  distinction 
between  trace  and  context  coding  is  important  for  the  discrimination  of  temporal  order  of  onset,  then  one  would  expect  both 
trace  and  context  coding  for  short  inter-stimulus  intervals  and  only  context  coding  for  longer  inter-stimulus  intervals. 

Methods 

Subjects 

This  experiment  began  with  four  subjects  to  each  of  the  two  stimulus  types.  As  the  experiment  progressed,  some 
subjects  left  the  experiment  to  take  other  positions,  and  were  not  replaced.  Two  subjects  resigned  after  the  orientation 
conditions,  leaving  six  subjects  to  complete  Experiment  1  One  of  these  subjects  then  resigned  leaving  five  subjects  to 
complete  Experiment  2. 

Procedure 

A  fixed  discrimination  procedure  was  used  to  generate  a  psychometric  function  for  each  of  four  time  intervals  (ISI) 
between  the  pair  of  stimuli  on  each  trial.  On  each  trial  the  minimum  (standard)  onset  difference  was  5  ms,  with  longer  onset 
differences  for  the  comparison  stimuli  Linear  regression  of  H-score  transformed  data  were  used  to  estimate  the  75%  difference 
limen  This  condition  was  repeated  with  a  different  ISI  until  difference  limen  were  estimated  for  ISI  values  of  100,  300,  500,  and 
1500  ms  Order  of  running  was  counter-balanced  across  subjects,  except  that  all  subjects  initially  ran  with  a  500  ms  ISI. 


Insert  Figure  2  about  here 


Results  and  Discussion 

The  results  are  summarized  in  Figure  2,  which  plots  the  size  of  the  difference  limen  as  a  function  of  ISI.  It  is  clear  that 
the  DL  for  the  Falling  F2  stimuli  (filled  symbols)  is  much  smaller  than  for  the  Rising  F2  stimuli.  Although  for  four  of  the  six 
subjects  the  DL  is  larger  at  the  smallest  (100  ms)  values  of  ISI  than  at  somewhat  larger  values  of  ISI,  the  small  average 
difference  is  definitely  not  significant.  Furthermore,  the  results  would  tend  to  indicate  that  if  trace  information  is  available  at  100 
ms,  it  serves  to  hinder,  rather  than  help,  the  subjects  The  psychophysical  function  is  essentially  flat.  Therefore,  it  would  appear 
that  subjects  are  not  changing  their  strategy  in  terms  of  the  information  available  and  employed  as  a  function  of  ISI.  These 
results  would  seem  to  indicate  that  the  distinction  between  trace  and  context  coding  is  not  important  in  the  identification  of 
temporal  order  of  onset  This  conclusion  will  be  discussed  after  the  next  experiment 


j 


DL  (or  Order  Onset  Identification 


6 


Experiment  2:  Testing  Models  • 

The  second  experiment  provides  a  direct  test  of  whether  there  is  a  perceptual  threshold  by  examining  the 
discrimination  function  in  terms  of  a  two-stage  versus  a  singular  linear  relationship.  In  addition,  this  experiment  compared  the 
size  of  the  difference  limen  as  a  function  of  fixed  versus  roving  procedure,  thus  further  evaluating  performance  of  this  task  in 
terms  of  the  underlying  coding  of  information. 


Methods 

Subject  0 

A  total  of  five  subjects  completed  this  experiment.  All  had  participated  in  the  early  experiment.  Three  of  the  subjects 
ran  the  conditions  with  the  Falling  F2  stimulus,  with  two  of  the  subjects  running  with  the  Rising  F2  stimulus. 

Procedure 

Ir,  this  experiment  ISI  was  fixed  at  500  ms.  Psychometric  functions  were  generated  for  each  subject  under  both  fixed 
and  roving  discrimination  conditions  for  base  stimulus  differences  from  5  to  85  ms  in  10  ms  steps,  except  that  the  fixed 

discrimination  task  at  75  ms  was  not  run.  The  difference  limen  was  estimated  from  the  75%  point  of  the  linear  regression  line  of  ^ 

the  Z-transformed  psychometric  function. 

Results  and  Discussion 

The  roving  and  fixed  discrimination  results  are  summarized  in  two  panels  of  Figures  3  and  4,  It  is  clear  that  the 
relationship  between  the  difference  threshold  and  the  onset  difference  is  best  described  by  a  two-stage  linear  function.  For 
relatively  short  onset  differences,  the  DL  is  a  linear  decreasing  function  of  onset  difference.  The  average  slope  of  this  segment 
of  the  psychometric  function  is  -0  55.  and  thus  is  consistent  with  the  predictions  of  the  threshold  model.  The  value  of  the 

threshold  appears  to  be  at  approximately  25.6  ms  for  the  Falling  F2  stimulus  and  approximately  45  ms  for  the  Rising  F2  ® 

stimulus.  Converting  these  values  of  the  DL  to  comparable  values  of  VOT  (adding  burst  and  silence  duration)  yields  an  average 
phoneme  equivalent  of  approximately  41  ms.  Furthermore,  when  compared  across  the  full  onset  time  continuum  and  across 
the  fixed  and  roving  tasks,  the  size  of  the  DL  for  the  Falling  (Figure  3)  and  Rising  (Figure  4)  F2  stimuli  are  equivalent 

At  longer  onset  differences  the  size  of  the  difference  limen  grows  very  slowly  as  a  function  of  onset  difference,  with  the 
function  having  a  slope  of  only  slightly  greater  than  0  (mean  =  0.05).  Therefore,  average  discrimination  relative  to  the  intial 
difference  in  onset  is  approximately  constant  Thus,  detection  of  differences  in  onset  for  stimuli  above  the  threshold  for 

detecting  onset  asynchrony  does  not  exhibit  a  Weber  function,  but  instead  grows  very  slowly  as  a  function  of  onset  difference  ® 

The  lack  of  a  Weber  relationship  for  the  longer  onset  differences  is  surprising.  However,  recently  Zera  and  Green  (1993) 

reported  results  on  the  detection  of  differences  in  temporal  onset  for  complex  stimuli.  Although  Zera  and  Green  were 

investigating  a  different  type  of  task,  the  results  do  parallel  those  reported  here  We  believe  that  the  recognition  of  order  of 

onset  is  taking  place  at  a  different  higher  level  of  perceptual  processing,  it  is  not  surprising  that  the  shape  of  the  two  functions 

could  be  similar.  Finally,  the  results  for  the  fixed  discrimination  task  are  not  significantly  better  than  the  roving  discrimination,  at 

least  when  measured  in  terms  of  the  size  of  the  difference  limen. 


Insert  Figures  3  and  4  about  here 


General  Discussion 

The  subjective  report  of  our  subjects  are  quite  consistent  with  our  own  impressions  about  the  temporal  order  task,  and 
with  the  measurement  of  other  differences  based  upon  temporal  properties  of  stimuli.  The  subjects  do  not  seem  to  be 

responding  directly  to  the  temporal  properties  of  the  stimuli,  but  rather  to  perceptual  changes  in  the  stimuli  which  are  correlated  0 

with,  or  a  function  of  the  temporal  properties  Examining  our  stimuli  from  this  perspective  provides  some  insight  into  the  nature 

of  the  temporal  order  task.  A  brief  stimulus  which  is  less  than  10  ms  in  duration  is  heard  as  a  click.  If  one  increases  the 

duration  of  the  stimulus  beyond  approximately  10  ms,  the  stimulus  begins  to  acquire  a  pitch-iike  quality,  with  pitch  achieved 

only  for  longer  stimuli.  This  perceptual  phenomenon  is  related  to  the  spectrum  of  brief  signals  where  the  effective  signal  band 

width  is  an  inverse  function  of  duration  (the  sine  function)  This  well-known  observation  can  be  generalized  to  the  task  of 

identifying  which  of  two  stimuli  had  an  earlier  onset.  With  onset  differences  of  less  than  a  specific  value,  the  earliest  portion  of 

the  initial  stimulus  provides  the  subject  with  only  broad  band  information  which  is  insufficient  to  identity  the  stimuli.  Stated  q 

another  way,  subjects  require  more  than  10  to  20  ms  of  a  signal  in  order  to  begin  to  perceive  any  pitch-like  quality,  and  thus  the 

temporal  onset  difference  must  exceed  this  value  for  the  subject  to  begin  to  have  sufficient  information  to  identify  which  of  the 

stimulus  components  had  an  earlier  onset.  If  the  stimuli  have  slow  rise  times  or  are  dynamically  changing  in  frequency 

composition,  it  is  not  surprising  that  longer  onset  differences  would  be  required  for  subjects  to  identify  which  of  two  stimuli  had 

an  earlier  onset. 

When  one  uses  highly  practiced  subjects  with  a  limited  set  of  conditions,  one  would  expect  to  be  able  to  push  the 
threshold  toward  much  shorter  onset  differences,  and  we  have  reported  such  results  in  the  past  (Pastore,  Harris,  Kaplan.  1981)  _ 

Conversely,  from  dealing  with  stimuli  which  vary  in  composition  (e  g.,  natural  speech),  it  should  not  be  surprising  the  limits  on  * 

temporal  order  identification  would  be  at  a  much  longer  onset  asynchrony. 


J 


OL  for  Order  Onset  Identification 


7 


In  the  very  real  sense,  threshold  for  temporal  order  identif  cation  could  be  considered  a  perceptual,  or  even  cognitive, 
limit  on  performance.  In  the  temporal  order  identification  task  a  subject  must  acquire  sufficient  information  about  the  stimulus 
with  the  earlier  onset  to  be  able  to  consistently  apply  the  appropriate  label  to  the  stimulus.  Although  the  threshold  is  specified 
in  terms  of  temporal  duration,  the  real  goal  for  the  subject  is  the  acquisition  of  sufficient  stimulus  specification  to  perform  the 
identification  task.  With  the  longer  values  of  TOT,  the  limits  on  perception  of  pitch  is  no  longer  relevant.  Instead,  subjects 
probably  are  required  to  make  a  judgement  based  upon  the  temporal  duration  of  the  initial  stimulus  component.  For  these 
longer  stimuli,  the  temporal  order  task  becomes  equivalent  to  the  temporal  discrimination  task  studied  by  Zera  and  Green 
(1993). 

Using  this  conceptualization,  we  can  now  turn  to  the  issue  of  context  versus  trace  coding.  It  was  highly  doubtful  that 
subjects  can  store  trace  representation  of  the  onset  of  the  initial  stimulus  for  a  sufficient  duration  so  that  it  may  be  compared 
directly  with  the  onset  of  the  second  stimulus.  In  addition  to  masking  or  interference  from  the  later  portion  of  the  first  stimulus, 
the  task  of  the  subject  for  short  onset  differences  would  be  equivalent  to  attempting  to  compare  the  trace  of  two  clicks  or  tone 
pips.  Once  the  duration  of  the  onset  difference  is  sufficient,  subjects  can  apply  a  label  to  the  stimulus  in  terms  of  the  onset 
difference,  and  then  can  perform  the  discrimination  based  upon  whether  or  not  the  second  stimulus  falls  in  the  same  perceptual 
category.  Therefore,  subjects  really  are  using  only  context  coding,  and  are  not  using  trace  coding  in  performing  the 
discrimination  task  at  least  for  onset  differences  up  to  threshold 

Bibliography 

Blumstein.  S.E ,  and  Stevens,  K.N.  (1981).  The  search  for  invariant  acoustic  correlates  of  phonetic  features  In  P.0.  Eimas  and 
J.L  Miller  (Eds  ).  Perspectives  in  the  Study  of  Speech.  Hillsdale,  N.J.:  Erlbaum 
Hillenbrand,  J.  (1984).  Perception  of  sine-wave  analogs  of  voice  onset  time  stimuli.  Journal  of  the  Acoustical  Society  of 
America,  75,  231-240. 

Hirsh.  I.  J.  (1974)  Temporal  order  and  auditory  perception.  In  H.  R  Moskowttz,  B.  Scharf  &  J  C.  Stevens  (Eds.).  Sensation 
and  measurement  (pp.  251-258).  Dordrecht,  Holland:  Reidel 

Hirsh,  I.J.  (1967)  Information  processing  in  input  channels  for  speech  and  language:  The  significance  of  serial  order  of  stimuli 
In  Brain  Mechanisms  Underlying  Speech  and  Language.  (NY:  Gruene  &  Stratton)  21-39. 

Hirsh,  I.J.,  and  Fraisse,  P.  (1964).  Simultaneity  et  succession  de  stimuli  h£l£rogines.  L'annee  Psycholoqique,  64,  1-19 
Hirsh,  I.J.  (1959).  Auditory  perception  of  temporal  order.  Journal  of  the  Acoustical  Society  of  America  31.  759-767. 

Kewley-Port.  D  ,  Watson,  C.S.,  and  Foyle,  D  C.  (1988).  Auditory  temporal  acuity  in  relation  to  category  boundaries:  Speech  and 
nonspeech  stimuli.  Journal  of  the  Acoustical  Society  of  America.  83,  1133-1145. 

Li.  X-F.,  and  Pastore,  R.E.  (1992).  Evaluation  of  prototypes  and  exemplars  for  a  phoneme  place  continuum  In  M.E.H  Schouten 
(Ed),  Audition.  Speech  and  Language,  Berlin  Mouton-De  Gruyter,  303-308. 

Usker,  L.,  Liberman,  AM..  Erickson,  D  M.,  Dechovitz,  D.,  and  M andler,  R  (1977).  On  pushing  the  voice-onset-time  (VOT) 
boundary  about.  Language  and  Speech,  20,  209-216 

Macmillan,  N.A.,  Braida,  L.D.,  and  Goldberg,  R.F.  (1987).  Central  and  peripheral  processing  in  the  perception  of  speech  and 
nonspeech  sounds.  In  Schouten,  M.E.H  (Ed.)  The  Psychophysics  of  Speech  Perception.  Nijhoff:  Boston.  28-45 
Miller,  J  D..  Wier,  C.C.,  Pastore,  R.E.,  Kelly.  W.J..  and  Dooling,  R.J  (1976)  Discrimination  and  labeling  of  noise-buzz  sequences 
with  varying  noise-lead  times:  An  example  of  categorical  perception  Journal  of  the  Acoustical  Society  of  America. 
60.  410-417 

Moore.  B.C.J.  (1989).  An  Introduction  to  the  Psychology  of  Hearing  (NY:  Academic  Press) 

Osgood,  C  E.  (1953).  Method  and  Theory  in  Experimental  Psychology.  (NY:  Oxford) 

Pastore,  R.E.  (1988).  Burying  straw  men  without  graves:  A  reply  to  Kewley-Port.  Watson,  and  Foyle  (1988).  Journal  of 
the  Acoustical  Society  of  America,  84.  2262-2266 

Pastore,  R  E..  Layer,  J.K.,  Morris,  C.B.,  and  Logan,  R.J.  (1988)  Temporal  order  identification  for  tone/noise  stimuli 
with  onset  transitions.  Perception  &  Psychophysics,  44,  257-271 
Pastore.  R.E.  (1987a).  Possible  acoustic  bases  for  the  perception  of  voicing  contrasts.  In  M.E.H.  Schouten  (Ed  ). 

Psychophysics  of  Speech  Perception.  (Boston:  Martinus  Nijhoff).  188-198 
Pastore,  R.E.  (1987b)  Categorical  perception:  Some  psychophysical  models  In  S  Harnad  (Ed  ).  Categorical 
Perception.  (New  York:  Cambridge  University  Press)  Chap  1. 

Pastore.  R.E.,  Harris,  L.B..  &  Layer,  J.K.  (1981).  Temporal  order  identification:  Some  parameter  dependencies.  Journal  of  the 
Acoustical  Society  of  America.  71,  430-436 

Pastore,  R.E.  (1981).  Possible  psychoacoustic  factors  in  speech  perception,  In  P  D  Eimas  and  J.L.  Miller,  (Eds.) 

Perspectives  in  the  Study  of  Speech,  (Erlbaum,  Hillsdale,  N.J.),  Chap  5. 

Pastore.  R  E.,  Harris,  L  B ,  and  Kaplan,  J.K.  (1981).  Temporal  order  identification:  Some  parameter  dependencies 
Journal  of  the  Acoustical  Society  of  America.  7J ,  430-436 

Pisoni.  D  B.  (1977).  Identification  and  discrimination  of  the  relative  onset  time  of  two-component  tones:  Implications  for  voicing 
perception  in  stops.  Journal  of  the  Acoustical  Society  of  America,  61.  1352-1361. 

Rosen,  S  .  and  Howell,  P  (1987)  Is  there  a  natural  sensitivity  at  20  ms  in  relative  tone -on  set -time  continua?  A  reanalysis  of 
Hirsh's  (1959)  data  In  M  H  E  Schouten  (Ed  ).  Psychophysics  of  Speech  Perception  Boston:  Nijhoff  199-209 
Sawusch.  J  R  (1992)  Auditory  metrics  for  phonetic  recognition  In  M  E  H  Schouten  The  Auditory  Processing  of 
Speech  From  Sounds  to  Words  NY  Mouton  de  Gruyter.  315-321 


DL  for  Order  Onset  Identification 


8 


Schouten,  M.E.H.,  &  van  Hessen,  A.J.  (1992).  Different  discrimination  strategies  for  vowels  and  consonants  In  M  E  H  • 

Schouten  The  Auditory  Processing  of  Speech:  From  Sounds  to  Words  NY.  Mouton  de  Gruyter,  309-314 

Soli,  S  (1983).  The  iole  of  spectral  cues  in  discrimination  of  voice  onset  time  differences.  Journal  of  the  Acoustical  Society  of 
America,  73.  2150-2165. 

Summerfield,  O.  (1982)  Differences  between  spectral  dependencies  in  auditory  and  phonetic  temporal  processing:  Relevance 
to  the  perception  of  voicing  in  initial  stops.  Journal  of  the  Acoustical  Society  of  America,  72,  51-61. 

Uchanski,  R.M.,  MiHier,  K  M.,  Reed,  C  M.,  &  Braida,  L.O.  (1992).  Effects  of  token  variability  on  resolution  for  Vowel 

Sounds.  In  M.E.H.  Schouten  The  Auditory  Processing  of  Speech  From  Sounds  to  Words.  NY  Mouton  de  ft 

Gruyter,  291-302. 

Volatis,  L.E.,  and  Miller,  J.L.  (1992).  Phonetic  prototypes:  Influence  of  place  uf  articu'ation  and  speaking  rate  on  the  internal 
structure  of  voicing  categories.  Journal  of  the  Acoustical  Society  of  Amtuca,  92.  723-735 

Warren,  R.M.  (1982).  Auditory  Perception  (NY.  Pergamon). 

Warren,  R  M.,  &  Byrnes,  D.  L.  (1975)  Temporal  discrimination  of  recycled  tonal  sequences:  Pattern  matching  and  naming  of 
order  by  untrained  listeners.  Perception  &  Psychophysics,  18,  273-280. 

Warrer  R.M.,  and  Obusek,  r  (1972)  Identification  of  temporal  order  within  auditory  sequences  Perception  £  q 

Psychophysics,  J2,  .  3-90. 

Watson,  C  S  ,  &  Kewley-Port,  D  (1988).  Some  reparks  on  Pasture  (1988)  Jrv£nahotJhe_Acoustical_Socieiy_of 
America,  84.  2266-2270. 

Woodworth,  R.S.,  and  Schlosberg,  (1954).  Experimental  Psychology.  (NY:  Holt) 

Zera,  J  .  and  Green,  D  M  (1993)  Detecting  temporal  onset  and  offset  asynchrony  in  multicomponent  complexes  Journal  of 
the  Acoustical  Society  of  America,  93,  1038-1052 

Zue,  V  W  (1976).  Acoustic  characteristics  of  stop  consonants.  A  controlled  study  (Unpublished  PHD  Dissertation),  _ 

Massachusetts  Institute  of  Technology.  9 


Acknowledgements 

This  research  was  supported  by  grants  F496209310033  at  d  F496093 10327  frorr  the  Air  Force  Office  of  Scientific 
Research  The  opinion s,  findings,  conclusions,  and  recommendations  are  those  of  the  authors  and  do  not  necessarily  represent 
those  of  the  granting  agency 


Fiquie  Options 

Figure  1  A  summary  of  the  hypothetical  distributions  of  response  categories  in  measuring  the  difference  limen  for  S'me 
arbitrary  stimulus  value  along  a  hypothetical  continuum.  Panel  A  illustrates  the  three  expected  perceptual  categories:  a  center 
region  (or  interval)  of  uncertainty  surrounding  perceptual  and  physical  equality,  and  regions  of  perceived  differences  above  and 
below  the  interval  of  uncertainty.  The  center  curve  is  a  Gaussian  distribution;  the  other  two  curves  are  cummulative  Gaussian 
distributions  adjusted  so  That  the  sum  of  ordinates  at  every  point  equals  a  constant  The  thre  category  method  has  responses 
corresponding  to  each  of  the  three  distributions.  Panel  B  shows  the  distribution  of  responses  for  measuring  the  DL  using  the 
two-category  procedure  For  any  given  stimulus  value,  probability  of  a  given  "high"  (or  'low'  )  response  is  based  upon  the 
probability  derived  fror i  that  distribution  in  Figure  la  plus  50  percent  (for  guer  mg)  of  the  probability  from  the  uncertainty 
distribution  The  50  percent  threshold  values  from  Figure  la  correspond  with  the  75  percent  threshold  values  in  Figure  1b 

Figure  2  The  DL  for  magnitude  of  order  of  onset  relative  to  a  5  ms  standard  onset  difference  is  plotted  as  a  function  of  the 
time  interval  (ISI)  t-  ;ween  the  two  stimuli  presented  on  a  given  trial  The  opr  i  symbols  represent  the  Rising  F2  stimuli  and  t.ie 
filled  symbols  represent  the  Falling  F2  stimuli.  The  solid  lines  represent  the  mean  for  the  two  stimulus  conditions 

Figure  3  The  DL  for  magmtu^®  of  order  of  onset  is  plotted  as  a  function  of  the  smaller  difference  in  onset  for  the  stimuli  with 
the  Falling  F2  component  The  two  panels  separately  plot  the  results  obtained  under  fixed  and  roving  discrimination  tasks  The 
separate  symbols  represent  individual  subjects.  The  two  lines  are  the  linear  regression  solutions  for  the  sets  of  data  for  small 
and  large  initial  differences  in  onset 

Figure  4  DL  results  for  Rising  F2  stimuli  (see  Figure  3  for  details) 


Percent  Percent 


Three  Category  Distributions 


Two  Category  Distributions 


> 


» 


> 


» 


» 


3  -8  -6  -4  -2  0  2  4  6  8  10 

Stimulus  Number 


Figure  1 


Difference  Limen  (ms.) 


60- 

50- 

40- 

30- 

20- 


D.L.  as  a  Function  of  ISI 


io  H 


Average  Falling  F2 


Onset  (ms)  _  A  Onset  (ms) 


Temporal  Onset  Difference  (ms) 


Figure  3 


Rouing  Discrimination  (Maximum  Uncertainty) 


Fixed  Discrimination  (Minimum  Uncertainty) 


Temporal  Onset  Difference  (ms.) 


Figure 


J  exploration  ol  i he  phonetic  structure  of  c  ues  for  place  of  articulation 


Richard  Fusion*.  Xiaofeng  l.i.  Jennifer  (’ho,  Barbara  Acker,  and  Shannon  Farrington 


Abstract 

A  ninlli-task.  multi -dimensional  approach  was  used  to  evaluate  the  nature  of  perceptual  space  and  the  relative  importance  of 
auditory  cues  for  the  perception  of  initial  position  voiced  stop  consonants  of  Fnglish.  For  each  of  several  different  vowel  contexts,  stimuli 
varying  systemalieally  in  F2-  and  F3 -onset  time  and  the  nature  of  an  initial  release  burst  arc  examined  from  a  numlicr  of  different 
perspectives.  For  each  vowel,  a  within  subject  design  is  used,  evaluating  the  stimuli  in  terms  of  classification,  speeded  classification, 
gtxxincss  for  each  of  the  possible  phoneme  categories,  and  similarity  scaling.  The  scaling  results  are  then  submitted  to  a  multi-dimensional 
scaling  analysis.  Fach  of  the  tasks  and  stimulus  parameters  have  been  the  focus  of  prior  investigations  of  phoneme  categories,  but  have 
never  been  combined  to  provide  a  complex,  systematic  picture  of  perceptual  space  to  provide  converging  evidence  for  the  nature  of  cues 
for  phoneme  categories. 

I  .nd  of  Abstract 


Running  Heading:  Phonetic  Structure  F'xploration 


Most  research  investigating  the  «.«cs  for  specific  pi.  memo  categories  such  as  place  of  articulation,  has  tended  to  focus  on  the 
location  of  category  boundaries  defined  along  a  single  physical  continuum,  largely  ignoring  the  nature,  quality,  and  extent  of  perception 
within  phonetic  categories.  More  recently,  some  studies  sometimes  have  l>een  expanded  to  demonstrate  "trading  relations."  which  are  really 
only  the  simple  interaction  of  a  limited  range  of  values  for  two  different  physical  continuua.  with  the  critical  dependent  variable  again 
being  the  location  of  the  category  boundary  defined  along  one  of  the  two  dimensions. 

Although  this  research  is  relatively  simple  to  conduct,  the  results  of  such  limited  focus  research  cannot  l>c  expected  to  significantly  advance 
our  knowledge  alxwt  the  critical  cues  which  define  the  perception  of  speech  categories. 

Among  the  more  promising  alternative  approaches  to  investigating  the  nature  of  phonetic  perceptual  spate  is  the 
multidimensional  scaling  of  similarity  ratings  of  stimuli  (eg..  Pols,  van  der  Kemp.  A  Plomp.  1969;  Carroll  A  ('hang.  1970;  Soli.  19S7;  Bladen 
A  Findblom.  1981).  However,  even  multidimensional  scaling  studies  have  tended  to  use  relatively  limited  sets  of  stimulus  dimensions. 
In  recent  years  there  also  has  been  a  growing  interest  in  the  possible  role  of  prototypes  or  exemplars  in  defining  phonetic  categories  (e  g.. 
Samuel.  1982;  Kuhl.  1991;  Volatis  A  Miller.  1992;  Sussman.  1993).  These  prototype-oriented  studies  have  begun  to  use  several  different 
measures  (e  g.,  goodness  rating,  discrimination,  or  selective  adaptation)  to  provide  the  l>eginnings  of  an  evaluation  of  the  perceptual 
structure  of  stimuli  falling  within  and  across  phonetic  categories.  Despite  this  trend,  very  few  studies  have  evaluated  perception  as  a 
function  of  a  numlicr  of  different  stimulus  dimensions 

The  related  studies  by  Hoffman  (1958)  and  Harris.  Hoffman.  Delattrc.  and  Cooper  (1958)  arc  an  excellent  example  of  the 
evaluation  of  the  perception  of  place  of  articulation  as  a  function  of  several  different  physical  properties,  specifically  F2-onsei  frequency. 
I  3-onset  frequency,  and  initial  onset  bursts.  These  early  studies  provide  an  excellent  delineation  of  the  limits  (Ixmndanes)  of  each 
perceptual  category  as  a  function  of  the  three  important  stimulus  variables.  Somewhat  more  recently.  Stevens  and  Bkximstein  (1978) 
evaluated  similar  variables  in  the  perception  of  place  of  articulation,  defining  the  variables  more  precisely  and  in  terms  of  more  dynamic 
properties 

The  current  research  uses  a  numlx.*r  of  different  types  of  measures  and  techniques  to  provide  a  more  detailed  specification  of 
the  nature  of  phoneme  perceptual  space  for  categories  of  initial  position  voiced  stop  consonants  varying  in  place  of  articulation.  All  of 
the  various  measures  utilized  in  this  research  have  been  employed  separately  in  past  investigations  of  phoneme  perception,  but  never  in 
combination,  and  never  have  been  used  to  provide  converging  evidence  for  strong  conclusions  alxiul  phoneme  perception.  Furthermore, 
all  the  basic  physical  properties  of  the  stimuli  also  have  l»ccn  subject  to  extensive  investigation,  but  not  at  the  level  or  detail  of  the  current 
research. 

The  current  research  evaluates  the  nature  of  the  perceptual  space  for  the  phoneme  categories  defined  by  /!»/.  /d/.  and  /g< 
The  stimuli  vary  in  F2-  and  F3-onsct  frequency,  the  presence  and  nature  of  an  initial  burst,  and  the  specification  of  the  following  vowel 
I  hesc  variables  were  identified  as  important  cues  in  early  research  by  Delattrc.  ct  al.  (1955).  and  by  Hoffman  (1958;  Hoffman,  et  al 
1958)  More  recent  important  investigations  include  those  by  Stevens  and  Hiunisicin  (1978).  Kcwlcy-Port  (19S2.  1983).  \carey  and 
Shammass  ( 1987)  Stevens  (1992).  and  Sussman.  McCaffrey,  A  Matthews  (1991 )  In  the  current  study.  the  stimuli  first  arc  evaluated  in 
terms  of  speeded  classification,  thus  allowing  specification  of  the  location  and  extent  of  each  phonetic  category  Ixnmdary,  as  well  as 
obtaining  reaction  time  measures  for  applying  la!>cls.  These  results  also  allow  for  direct  comparisons  of  the  c  urrent  finding  with  the  many 
published  studies  focusing  on  category  boundary  location  along  one  of  the  various  dimensions  investigated.  The  complete  set  of  stimuli 
then  are  evaluated  in  terms  of  the  relative  gtxxincss  of  the  lalxds  "b."  "d."  and  "g."  Finally,  a  subset  of  stimuli  is  subjected  to  the  rating 
of  similarity  between  all  possible  pairings  of  stimuli,  thus  allowing  the  use  of  multidimensional  scaling  techniques  to  evaluate  the  underlying 
dimensionality  of  perception.  'I Tic  correspondence,  or  consistency,  among  the  various  measures  provides  an  indication  of  the  degree  to 
which  the  various  measures  arc  indicating  similar,  or  different,  underlying  prcKCSses.  Furthermore,  the  nature  and  shape  of  the  defined 
perceptual  space  provide  strong  indication  not  only  of  the  critical  underlying  stimulus  dimensions  and  the  symmetry  of  the  perceptual 
decision  mechanism,  bul  also  the  rclavcncc  of  prototype  versus  exemplar  (or  alternative)  models  of  categorization 

I  arlier  Research 

I  i  and  Pastore  (1992)  present  the  inlia)  work  in  this  research  project.  This  study  systematically  varied  the  1  2-  and  l'3-onsci 
frequency  for  a  set  of  synthetic  (  V  syllables  (without  miti.il  burst)  based  upon  the  vowel  /a/  The  stimulus  parameter  for  this  study 
(along  with  those  from  I  xpenmcni  I  and  2)  arc  summarized  m  Table  1  The  study  used  an  open-ended  lalxlmg  task,  and  a  speeded 
<  lassifiralion  task  on  the  full  set  of  stimuli  plus  a  gixxtncss  rating  task,  and  a  similarity  rating  task  on  a  subset  of  the  stimuli  to  first  provide 
an  overall  perspective  of  the  perceptual  space  for  the  phonetic  i.tlrgnrics  /!>■'.  d  and  'g  .  I  his  experiment  used  U  to  IS  subjects  pet 


» 


» 


» 


Phonetic  Structure  Exploration 


2 


condition,  but  different  subjects  for  each  condition.  The  basic  finding  for  this  vowel  condition  was  that  /b/  is  defined  by  a  low  I  '2 -onset 
frequency  (F2  <  1  100  Hz);  thus  rising  F2  at  onset.  For  higher  F2 -onset  frequencies  the  perceived  category  C/d/  or  /g/  and  almost  never 
"other")  depended  upon  an  interaction  of  F2-  and  I  3-onset  frequency.  (be  goodness  ratings  indicated  a  difference  m  the  perceptual 
structure  of  the  three  categories,  with  higher  levels  of  goodness  for  /d/  Ixnng  concentrated  and  for  /b/  l>cing  relatively  uniform  and 
diffuse.  A  combination  of  speeded  classification  and  labeling  results  were  used  to  evaluate  predictions  from  prototype  and  exemplar 
models  Itased  upon  the  work  of  Nosofsky  (1991).  Both  types  of  models  provided  excellent  predictions,  each  accounting  for  97 of  the 
variance. 

Current  Research 

The  research  completed  to  date  under  the  current  project  represents  several  improvement  over  the  1  i  and  Pastore  study.  Ibe 
stimuli  were  synthesized  using  the  CSRI  4.0  version  of  the  Klatl  synthesizer,  with  increased  variation  in  F0  and  amplitude,  especially  at 
offset,  to  improve  the  perceived  quality  of  the  stimuli.  In  addition  to  varying  F2-  and  F3-onset  frequency  (with  finer  sampling  of  F3-onset 
frequency),  three  different  onset  bursts  conditions  (no  burst,  low-frequency  burst,  and  high-frequency  brust)  were  added.  I'hc  stimulus 
set  thus  consisted  of  the  factorial  combination  of  values  of  F2 -onset  frequency.  F3 -onset  frequency,  and  three  different  burst  conditions. 
The  two  experiments  completed  to  date  differed  in  terms  of  the  vowel,  and  thus  the  selection  of  F2-  and  13 -on set  frequencies.  Each 
experiment  useo  a  wi thin-subject  design  ..  iih  each  of  six  to  eight  subjects  (normal  hearing,  native  speakers  of  American  English )  completing 
all  tasks  with  the  given  set  of  stimuli  Ibe  approximately  100  stimuli  were  subject  to  speeded  classification  [open-ended  latxding  (labels 
of  "b."  "d."  "g,"  and  "other")  with  RT  measured),  separate  goodness  rating  tasks  (one  each  for  /b/.  /d/,  and  /g/).  Because  of  the 
multiplicative  growth  in  number  of  distinct  trials  as  a  function  of  the  number  of  stimuli  (24  stimuli  require  576  trials  for  one  presentation 
of  each  stimulis  pairing;  33  stimuli  requires  1089  trials),  the  evaluation  of  the  similarity  between  pairs  of  stimuli  was  based  upon  a  subset 
of  the  original  stimuli;  this  task  was  repeated  five  to  seven  times,  each  with  a  new  randomization  of  stimuli.  lhc  similarity  scaling  utilized 
each  of  the  three  burst  conditions  for  a  limited  set  (8  to  1 1)  of  different  values  of  F2-  and  1'3-onset  frequencies  selected  on  the  basis  of 
the  goodness  rating  results.  All  stimuli  were  presented  binaurally  over  11)11-49  earphones  at  a  comfortable  listening  level.  Subjects  were 
run  singlcly  or  in  pairs  in  commercial  sound  chamlxrrs. 


Experiment  1 

lhc  first  condition  was  based  upon  the  vowel  f  'yf  and  factonally  combined  seven  values  of  F2 -onset  frequency  with  five  values 
of  F3«onsct  frequency  and  the  the  three  different  burst  conditions,  as  summarized  in  the  middle  column  of  Table  l  Figure  1  presents 
the  lal>eling  results  in  terms  of  percent  correct  for  the  labels  "b."  "d,“  and  "g"  (the  label  "other"  was  almost  never  used)  as  a  function  of 
F2 -onset  frequency.  1’3-onset  frequency,  and  type  of  burst.  It  is  quite  clear  that  /b /  is  defined  by  a  combination  low  F2-  and  low  F3-onsci 
frequencies.  Redefining  the  stimuli  in  terms  consistent  with  Stevens  (1992).  /!>/  seems  to  lie  specified  by  the  upward  movement  from  onset 
of  both  the  F2  and  13  resonances,  but  can  tolerate  a  flat  F3  resonance  when  the  1'2  resonance  is  dearly  rising.  Adding  a  low  or  high 
frequency  burst  decreases  the  rate  of  classifying  these  stimuli  as  /!>/.  A  burst  also  shrinks  the  range  of  stimuli  l>cing  acceptable  as  / b/. 
removing  the  falling  or  flat  F3  stimuli  from  this  category.  Jhus,  those  stimuli  which  mixed  the  direction  of  F2  and  F3  resonances  arc 
probably  somewhat  ambiguous  with  "b"  being  the  default  category  label  (this  conclusion  is  confirmed  by  the  goodness  ratings). 

In  the  at>sence  of  a  release  burst,  categorization  of  the  phoneme  /g/  seems  to  !>c  defined  in  terms  of  high  F2-  and  I'3-onsct 
frequencies,  and  thus  the  downward  movement  from  onset  of  the  12  and  F3  resonances.  Adding  a  release  burst  reduces  the  /g/  category 
range  to  the  stimuli  with  the  more  sharply  falling  F2  resonance  (F2 -onset  >  2200).  Ibe  phoneme  /d/  seems  to  involve  F2  and  13 
resonances  that  move  in  somewhat  opposite  directions,  with  this  category  really  existing  primarily  in  the  presence  of  a  high  frequency  noise 
burst  at  onset,  and  even  then  the  category  docs  not  seem  strong.  ’1‘he  pattern  of  F2-  and  F3-onsct  frequencies  in  defining  phonetic 
categories  thus  seems  to  lie  quite  different  from  that  found  by  l.i  and  Pastore  (1991)  for  the  vowel  /a/. 

Insert  Figure  I  and  2  alxmt  here 


The  goodness  rating  tasks  used  a  scale  of  1  (very  pcxir  example  of  the  phoneme  category)  to  7  (excellent  example  of  the 
phoneme  category),  l  hc  gocxlncss  rating  results  arc  summarized  in  Figure  2.  It  should  l»c  noted  that  there  was  considerable  individual 
differences  in  the  ratings  of  the  optimum  stimulus  for  any  given  category,  and  also  in  the  use  of  the  rating  scale  to  indicate  gcxxlncss. 
Such  individual  differences  tended  to  cause  an  overall  regression  of  the  gcxxlncss  ratings  toward  the  mean,  although  the  general  pattern 
of  results  seems  relatively  consistent  across  subjects.  For  /b/  the  overall  pattern  of  gcxxlncss  results  is  generally  equivalent  to  the  labeling 
results.  However,  for  /g/.  the  addition  of  either  type  initial  release  burst  did  not  significantly  alter  goodness  of  the  stimuli  with  a  sharply 
falling  F2  resonance.  Moderate  levels  of  gixxlness  for  /d/  arc  found  with  mixed  F2-  and  I  3-rcsonancc  changes  accompanied  by  a  high 
frequency  burs!  at  onset 


Insert  Figure  3  alxuii  here 


I  be  similarity  scaling  results  for  a  sampling  of  1 1  values  of  onset  frequencies  crossed  with  the  three  types  of  burst  conditions 
( 33  stimuli)  were  subjected  to  a  multi -dimensional  sealing  analysis  which  yielded  an  excellent  fit  in  two  dimensions  (it  is  more  interesting 
to  obtain  a  solution  which  ycilds  fewer  or  different  dimensions  than  the  physical  variables  manipulated  ’I be  stimuli  in  this  Figure  are 
coded  in  terms  of  burst  type  (fill of  symlxils)  and  relative  direction  of  12-  and  I  3-onsets  (svnilxd  type)  It  is  quite  c  lear  from  these  results 
that  dime  nsion  2  is  coding  burst  type,  with  the  no  burst  (unfilled  symlmls)  and  high  burst  {darkest  filled  svmlx>ls)  stimuli  Ix'ing  at  the  two 
ends  of  jh<  dimension  and  lhc  low  burst  stimuli  M?g,\jlv  filled  syndnijs)  in  the  ce  nter  I  hi  two  extreme  s  <<!  dimension  I  represent  the  1  2 


E’honctic  Structure  l-.'xploration 


.3 


and  resonances  both  rising  (<)  and  both  falling  (►)  together,  with  the  middle  portion  of  the  dimension  representing  a  mixture  of  rising 
and  falling  resonances  ( ♦  ). 

Figure  3h  replots  the  multidimensional  scaling  stimuli  in  terms  of  the  lal>eling  results.  I  arge.  Ixild  symbols  designate  a  very  high 
percentage  (90-100%)  labeling  for  the  given  stimulus,  whereas  small,  mixed  stimuli  represent  approximately  equal  use  of  two  phonetic 
labels.  Note  that  dimension  1.  coding  change  in  F2-  and  F3-onsct  resonances,  clearly  differentiates  /!>/  from  the  other  phonetic  categories, 
and  may  provide  some  very  small  differentiation  tictwccn  the  /d/  and  /g/:  these  scaling  results  arc  consistent  with  the  classification  and 
goodness  results  for  Jbf  in  this  vowel  context.  Dimension  2.  the  nature  of  the  burst,  has  a  small  effect  on  the  goodness  of  the  /!>/.  and 
tends  to  provide  some  differentiation  between  /d/  and  /g/.  Therefore.  the  change  in  resonant  frequency  at  onset  seems  to  lx;  important 
for  defining  ,/b/  and  differentiating  it  from  the  other  two  phonemes,  with  the  nature  of  the  burst  primarily  distinguishing  tictwccn  w 
and  /g/.  However,  based  upon  the  levels  of  labeling  and  gixxlncss  observed,  wc  suspect  that  other  factors  not  captured  in  our  stimuli  arc 
needed  to  more  fully  differentiate  /d/  from  the  other  voiced  phoneme  categories. 

Fxnerimcnt  2 

'Hie  second  experiment  in  this  study  was  still  lx*ing  conducted  at  the  time  that  this  report  was  prepared.  This  experiment 
reexamines  the  condition  with  the  vowel  /a/  studied  in  the  original  1 1  and  Pastorc  experiment,  but  with  a  totally  new  set  of  synthetic 
stimuli  (with  liettcr  sampling  of  F3),  and  adding  the  burst  conditions  utilized  in  Fxperiment  1 .  It  is  critical  for  the  project  that  the  essential 
no  burst  classification  results  replicate  those  of  I.i  and  Pastorc  (1992).  thus  demonstrating  the  stability  of  these  findings.  The  addition 
of  the  two  burst  conditions  then  build  upon  the  basic  findings.  The  current  report  is  liascd  upon  the  speeded  classification  and  gexxincss 
rating  results  for  eight  subjects,  all  of  whom  currently  arc  Ixnng  run  under  the  scaling  conditions. 


Insert  Figure  4  and  5  a  Imut  here 

Figure  4  summarizes  the  labeling  results  from  Fxperiment  2  for  seven  of  the  subjects:  one  subject  produced  lalxMing  and 
goodness  rating  results  which  were  in  places  discrepant  relative  to  the  other  subjects  (c.g..  assigning  gexxincss  rating  IjcIow  2.0.  or  lalxdmg 
percentages  Mow  20%.  to  stimuli  which  received  rating  above  6.0.  or  percentages  above  KU%,  from  all  of  the  other  subjects,  and  visa 
versa).  The  no  burst  classification  results  for  the  perception  of  /I)/  are  consistent  with  those  reported  by  l.i  and  Pastore  using  a 
completely  different  sample  of  stimuli  based  upon  the  same  vowel.  All  stimuli  with  F2 -onset  frequencies  below  approximately  1200  11/ 
are  tlassified  as  /!»/.  At  higher  values  of  F2 -onset  frequency  the  predominant  lalxding  category  is  /d/.  with  some  shift  toward  /g/  at  low 
values  of  F3  I~hc  new  set  of  classification  results  are  for  the  two  conditions  which  add  an  initial  burst.  In  each  of  these  conditions  the 
presence  of  an  onset  burst  does  not  change  the  lalxding  of  /!>/.  Furthermore,  the  rate  of  labeling  for  low  12  stimuli  appears  to  be 
relatively  uniform  across  burst  F3-onset  frequency.  Therefore,  the  classification  results  indicate  that  F2-onscl  frequency  (or  direction  of 
change  from  onset)  differentiates  /!>/  from  the  other  phonetic  categories  (/<!/  and  /g/). 

At  high  F2 -onset  frequencies  (falling  F2  resonances)  adding  a  low  frequency  burst  has  a  relatively  uniform  effect,  causing  these 
stimuli  to  perceive  primarily  as  /g/  Finally,  changing  to  a  high  frequency  burst  causes  these  high  F2 -onset  frequency  stimuli  to  l>e 
perceived  almost  uniformly  as  /d/.  It  appears  that  the  IT-onset  frequency,  at  least  for  the  range  of  values  sampled,  has  very  little  effect 
on  perception  for  this  vowel.  Therefore,  the  classification  results  indicate  /d /  and  /«/  arc  distinguished  by  burst-type  for  Ixuh  /a/  and 

Figure  5  summarizes  the  gexxincss  rating  results  for  the  seven  subjects.  I  ach  subject  exhibited  a  median  rating  of  between  (> 
and  7  for  at  least  one  stimulus  in  each  of  the  three  phoneme  categories;  as  well  as  for  a  median  rating  of  1.0.  the  poorest  possible  rating, 
demonstrating  that  all  subjects  used  the  full  range  of  rating  values  in  each  rating  task.  Although  stimuli  with  higher  ratings  tended  to  lx’ 
grouped  for  individual  subjects,  the  exact  location  of  the  highest  rating  differed  somewhat  lx*twccn  subjects  thus  causing  the  average  o! 
the  median  ratings  to  lx;  somewhat  lower  than  those  typical  of  individual  subjects,  but  still  quite  high  within  categories. 

In  general,  the  goodness  rating  results  lend  to  parallel  the  lalx-ltng  results  /!>/  is  characterized  by  a  low  F2-onscl  frequency 
and  is  independent  of  Ixuh  an  initial  burst  and  the  nature  of  the  I  3-onsci  frequency  /g/  and  /df  are  characterized  by  a  high  I  2-onsct 
frequency  together  with  an  initial  noise  burst,  with  burst  type  differentiating  lx* tween  the  two  categories.  One  difference  Mwcen  the. 
L'oodnrss  and  the  classification  results  is  found  under  the  no  burst  condition  for  the  higher  12  stimuli.  Although  these  stimuli  are  classified 
as  /<!/.  none  of  the  stimuli  are  very  gcxxl  tokens  of  this  category:  fdf  seems  to  lx*  a  default  (alx’hng  category. 

I  he  contrast  tictwccn  the  current  approach  and  the  more  traditional  approach  based  upon  the  legation  of  category  boundaries 
can  be  illustrated  by  referring  bark  to  the  low  burst  condition  in  Fig.  4  If  one  were  to  hold  F3-onsct  frequency  constant  at  2000  11/  and 
vary  f  2-onsct  frequency,  one  would  find  two  discrete  categories  (/b/  and  /g/)  with  a  50  percent  laMmg  Unmdary  at  approximately  1200 
11/  The  traditional  interpretation  of  such  results  would  l>c  that  I'2-onsct  frequency  is  a  cue  which  differentiates  between  /!>/  and  /g /. 
As  wc  have  seen,  a  low  I  2-onsct  frequency  is  definitely  a  rue  for  /!>/  and  differentiates  it  from  other  phonetic  categories,  but  I  2-onsct 
frequency  is  not  an  adequate  rue  for  /g/  I  xpanding  upon  this  example,  assume  that  we  now  run  another  labeling  condition  with  13-onset 
frequency  fixed  at  2K00  II/.  again  varying  F2-onsct  frequency.  Wc  here  would  find  u  category  Ixnmdary  at  approximately  1275  U/.  thus 
apparently  demonstrating  a  trading  relation  tictwccn  F3-  and  F2 -onset  frequencies  for  the  contrast  tictwccn  the  /!>/  and  /g/  However, 
wc  have  already  seen  that  |-  3-onset  frequency  has  very  little  effect  on  phoneme  category,  or  category  gexxincss,  for  this  vowel.  Instead, 
we  appear  to  have  relatively  simple,  straight-forward  definitions  of  c  ategories  based  upon  F2  onset  and  burst-type.  Wc  sec  these  categories 
when  we  Utter  capture  the  complexity  of  the  stimuli  (as  in  Figure  3  and  4 )  rather  than  trying  to  draw  strong  conclusions  based  upon  small 
c  Manges  in  the  location  of  a  lalxding  Ixiundary  along  a  single  dimension  or  slice  through  the  pe  rceptual  space 

It  is  templing  to  draw  some  conclusions  aUmt  the  role  of  transition  for  low -frequency  formants  in  defining  /h  Howe  ve  r 
similar  types  nt  results  really  need  In  lx*  collected  for  other  vowels  lx-l'oie  .un  strong  conclusions  arc  eon  lectured. 


Phonetic  Structure  (Exploration 


4 


I 


Other  experiments  0 

I’pon  completion  of  the  similarity  scaling  condition  for  lExpcrimenl  2  we  will  { 1 )  collect  similar  data  for  the  /u/  vowel  context 
and  re-examine  the  context  of  the  f\f  vowel  using  a  new  set  of  stimuli  to  attempt  to  obtain  better  exemplars  of  /d/.  At  (hat  point  we 
will  have  sampled  vowels  from  three  major  front-back  locations.  At  that  time  we  will  prepare  a  major  manuscript  describing  the  results 
obtained  to  dale.  In  addition,  we  will  explore  the  perceptual  space  for  these  phonetic  contrasts  (/!>/.  /d /.  and  /g /)  in  the  context  of  other 
vowels,  and  for  other  possible  cues  for  phoneme  contrast 


Bibliography 

Best.  C.T..  Morrongicllo,  B..  and  Robson.  R.  (1981).  Perceptual  equivalence  of  acoustic  cues  in  speech  and  nonspecch  perception 
Perception  and  Psychophysics.  29.  191-211. 

Bladon.  R.A..  and  Lindblom.  IV  (1981).  Modeling  the  judgement  of  vowel  quality  differences.  Journal  of  the  Acoustical  Society  of 
America.  69.  1414-1422. 

Biumstcin.  S.I  ..  Isaacs.  IE.,  and  Menus.  .1.  (1982).  The  role  of  the  gross  spectral  shape  as  a  perceptual  cue  to  place  of  articulation  m  initial 
slop  consonants.  Journal  of  the  Acoustical  Society  of  America.  72.  43-50. 

Blumstein.  SI.  and  Stevens.  K.N.  (19/9).  Acoustic  invariance  in  speech  production:  lEvidencc  from  measurements  of  the  spectral 
characteristics  of  stop  consonants.  Journal  of  the  Acoustical  Society  of  America.  66.  1001-1017. 

Carroll.  J.IV.  and  Chang.  J.J.  ( 1970).  Analysis  of  individual  differences  in  multidimensional  scaling  via  an  Vway  generalization  of  "PEckart- 
Young"  decomposition.  Psvchomctrika.  35.  283-319. 

Delatlre.  PC  .  I.ilxirman.  A.M.,  and  Ccxiper.  I.  S.  (1955).  Acoustic  loci  and  transitional  cues  for  consonants.  Journal  of  the  Acoustical 
Society  of  America.  27.  769-773. 

Harris,  k  S  Hoffman.  H.S..  I  ilxrrman.  A.M..  Delatlre.  P.C..  and  Cooper.  I  S.  (1958).  1. fleet  of  third-formant  transitions  on  the 
perception  of  the  voiced  stop  consonants.  Journal  of  the  Acoustical  Society  of  America.  3U.  122-126. 

I  loffman.  1 1  S  ( 1958).  Study  of  some  cues  in  the  perception  of  the  voiced  slop  consonants.  Journal  of  the  Acoustical  Society  of  America. 
30.  1035-1041. 

Kewley-Port.  I).  (1982).  Measurement  of  formant  transitions  in  naturally  produced  stop  consonant-vowel  syllables.  Journal  of  the 
Acoustical  Society  of  America.  72.  379-389. 

Kewley-Port.  D,  ( 1983).  l  ime-varying  features  as  correlates  of  place  of  articulation  in  stop  consonants.  Journal  of  the  Acoustical  Society 
of  America.  73.  322-335. 

kuhl.  p  k  (1991 ).  Human  adults  and  human  infants  show  a  "perceptual  magnet  effect"  for  the  prototypes  of  speech  categories,  monkeys 
do  not.  Perception  and  Psychophysics.  50.  93-107. 

I  i.  X.-I  .  and  Pasture.  K  I  (1992)  I  valuation  of  prototypes  and  exemplars  for  a  phoneme  (Ware  continuum  In  M  l  II 
Schoulen  < I  Ed),  Audition.  Speech  and  language.  Berlin:  Moulon-De  (iruyier.  303-308. 

Searcy.  T.M  .  and  Shammass.  S  IE.  (1987).  pormanl  transitions  as  partly  distinctive  invariant  properties  in  the  identification  of  voiced 
stops  (  an.  Acoiisl  .  15(4).  17-24 

Nosofsky.  R At.  (1986)  Attention,  similarity,  and  the  identification-categorization  relationship.  Journal  ol  1  xpcnmenlal  Psychology. 
General.  1 15.  39-57 

Nosofsky.  R.M  (1991).  Jests  of  an  exemplar  model  for  relating  perceptual  classification  and  recognition  memory.  Journal  of 
I  xperimcnial  Psychology:  Human  Perception  and  Performance.  17.  .3-2 7 

Pols.  I  C'.W  .  van  der  Kemp.  I  .1  Th  .  and  Plomp.  R.  (1969).  Perceptual  and  physical  spate  of  vowel  sounds.  Journal  of  the  Acoustical 
Society  of  America.  46.  458-467. 

Repp.  lilt.  (1982).  Phonetic  trading  relations  and  context  effects:  New  experimental  evidence  for  a  speech  mode  of 
perception.  Psychological  Bulletin.  92.  81-1 10. 

Repp.  B  II  ( 1983).  I  rading  Relations  Among  Acoustic  Cues  in  Speech  Perception  are  I  argely  a  result  of  Phonetic  Categorization  Speech 
Communication.  2.  34 1 -362 

Samuel.  A.Ci.  ( 1982)  Phonetic  prototypes.  Perception  and  Psychophysics.  3k  307-314 

Soli.  S  (1983)  I  he  role  of  spectral  cues  in  discrimination  of  voice  onset  time  differences.  Journal  ol  the  Acoustical  S<x  icty  of  America. 
73.2150-  2165 

Stevens,  k  Y.  and  Blumstein  S  I  ( 1978)  Invariant  cues  for  place  of  articulation  in  slop  consonants  Journal  ol  the  Acoustical  Society 
of  Amenta.  64.  1 358- 1 368. 

Stevens.  K.N. .  and  Blumstein.  S.I  .( |98l ).  The  search  for  invariant  acoustic  correlates  of  phonetic  features.  In  f'l)  I  imasandj  I  Miller 
(I  ds.).  Perspectives  on  the  Study  of  Speech.  Hillsdale.  NJ:  lErlbaum. 

Stevens  (1992)  Colloquium  at  Cornell  l  niversily,  March  1992 

St mldert- Kennedy.  M  .  I  ilx'rman,  AM.  Hams.  K.S..  and  C(x>pcr.  I'.S.  (1970).  Motor  theory  of  speech  perception  A  reply  to  1  ane's 
critical  review.  Psychological  Review.  77.  234-249. 

Sussman.  II  M  .  McCaffrey.  1 1  A  .  and  Matthews.  S.A.  (1991).  An  investigation  of  l<xus  equations  <is  a  source  ot  relational  invariance  for 
stop  place  categorization.  Journal  of  the  Acoustical  Society  of  America.  90.  1309-1325 

vossmun.  .11  (1993).  A  preliminary  test  of  prototype  theory  for  a  |l>a}-to-|da|  continuum  Journal  ol  the  Acoustical  Society  of  Ament  a. 
93.  2392  (Alistraci) 

V  olatis.  1  I  and  Miller,  i  f  ( 1992)  Phonetic  prototypes:  Influence  of  place  ol  ariu  illation  and  speaking  ran-  on  the  internal  structure 
of  vni<  mg  (  ategorics  Journal  ol  the  Acoustical  Society  of  America.  92.  "2 3-"*  35 


0 


’i 


4 


[’hone tic  Structure  Fxploratinn 


5 


Zuc.  V  W.  (1976).  Acoustic  characteristics  of  stop  consonants:  A  controlled  study.  (Inptiblished  HU)  Dissertation).  Massachusetts 
Institute  of  Technology. 

Acknowledgements 

This  research  was  supported  by  grant  F4962093 10033  from  the  Air  Force  Office  of  Scientific  Research.  lTc  opinions,  findings, 
conclusions,  and  recommendations  are  those  of  the  authors  and  do  not  necessarily  represent  those  of  the  granting  agency. 

Figure  Captions 

f  igure  1.  Classification  phonemes  /!>/.  /d /  and  /g/  in  context  of  vowel  /i/  as  a  function  of  l-3-onset  frequency  (absyssa  of  each  Iwr 
graph).  F2 -onset  frequency  (separate  rows  of  bar  graphs),  and  the  nature  of  the  initial  hurst  (columns  of  bar  graphs)  For  each  individual 
stimulus  Ihe  percent  labeling  for  the  categories  /b /  (red  with  light  diagonal  line),  /d/  (light  blue  with  knotted  pattern),  and  /g/  (yellow 
with  brick  pattern)  are  indicated.  Recause  the  category  "other"  was  seldom  used,  the  rate  of  this  response  is  not  indicated  other  than  by 
the  sum  of  the  other  three  labeling  rates  being  less  than  100. 

Figure  2.  Goodness  rating  results  for  vowel  f\f.  I*hc  organization  of  this  Figure  is  equivalent  to  the  classification  results  in  Figure  1. 
I"hc  highest  level  of  goodness  is  7.0.  and  goodness  ratings  were  obtained  separately  for  each  of  the  three  phoneme  categories. 

Figure  3.  Two-dimensional  solution  for  similarity  scaling  results  is  coded  in  the  upper  panel  in  terms  of  the  nature  of  the  F2-  and  F.Vonset 
resonances  and  the  nature  of  the  burst.  'Die  two  resonances  can  lie  both  rising  (•«).  Ixith  falling  (►).  or  a  mixture  of  rising  and  falling  (  ♦  ) 
The  burst  can  l>c  absent  (open  symt>ols),  low  frequency  (light  fill),  or  high  frequency  (dark  fill).  The  lower  panel  is  axled  in  terms  of 
the  relative  frequencies  of  latxding  contained  in  the  classification  results.  The  large  symlxil  indicates  a  very  high  rate  of  lalxding  for  the 
given  category,  whereas  a  small,  double  symlxil  indicates  Ix'twecn  40  and  60  percent  )al>cling  for  the  two  phoneme  categories,  with  the 
more  frequent  category  listed  first. 

Figure  4.  Classification  results  for  the  vowel  /a/.  (See  Figure  I  for  description  of  Figure  organization |  in  this  f  igure  the  dark  Ithic 
category  represents  the  use  of  the  label  "other." 


Figure  5  (itxxl  rating  results  for  vowel  /a/  (See  f  igure  2  for  description  of  organization). 


Uouiel  III 


(Hz) 


Uowel  /i/ 

High  Burst 


Burst 


Goodness  Ratings  for  Uouiel  /a/ 


